Re: Tsvart last call review of draft-ietf-bfd-vxlan-07

Greg Mirsky <gregimirsky@gmail.com> Thu, 20 June 2019 01:58 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43EFF1202E7; Wed, 19 Jun 2019 18:58:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oYmNQ2Sa6jBt; Wed, 19 Jun 2019 18:58:46 -0700 (PDT)
Received: from mail-lf1-x133.google.com (mail-lf1-x133.google.com [IPv6:2a00:1450:4864:20::133]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49C9D12008F; Wed, 19 Jun 2019 18:58:46 -0700 (PDT)
Received: by mail-lf1-x133.google.com with SMTP id a25so1171513lfg.2; Wed, 19 Jun 2019 18:58:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ThgSABPZ1DsPtOrzuk+MTON1y+tkQptuSDc+ypGbNTA=; b=AoGdYJOKYlTS/w74CiC/j+obKiJxz3FfIHmJ+5zrxYN+q9GdDGug1Sdah/7aN0e8OK U3yNBjG4aaa/LOIGd3Jy7NiWhIkRwD8zZXKrJUkVCkYdFy13ZmfWZTDEUWDfUSV0DwDf 7B/7r3r8JqBI8IbryXoSpqS1YnkTqbukJH8B5zFG0O2LVjGMtI6m8z+5AeAOAqklQEBC bpUdDluA6EYixJVIV6EnCaT+RnE2F1Oj4z2My00ZBXCrYozQuiF0/zHXudG56Hmak8Gt WD2ZWof0Cmfol2Qm3oeXKEv9WGLwFHxv4SkRGIodqiRC/ZfvtKVxFfkMuMOh8GEgBJIk 6G4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ThgSABPZ1DsPtOrzuk+MTON1y+tkQptuSDc+ypGbNTA=; b=qeMpkNxQxNYaS/TedKfBUNzspaFRcyEtYGPQRaYCXcmclwMLkCgg6mJzIgnEfUGtjy w3Ki71FgSE9kWVG1nR8UOdFOcbHYpxnHSK8eoP09qSZIg6YGeSfd1+O+Ha8MsvdUJhwj R4T2ifW05xQZiI5rmrnqeTtXb401msuVD358vf7HLexSd4hiJ8e88eh3JmRNdYggsk8V BX+UPy7PGSmlxPISFwSKxdZbbp80SPNU7Z3MDAb71HTgG/sSptOMYCN6PsOvL2Y1ahfN 7FanijfG/MtFQQGtLGkCV2huMgpWaLtNv4oVIu4LQV+trY1dwAxC3+vQug6gnKYL2vx0 3qGQ==
X-Gm-Message-State: APjAAAVCR5+Jk+VRbLZoYIChg0+qaQ55OrWmXKR8KRtWwFKCltgsKpxQ r0QDfZaL0tCjc6xSJ4aHsMEJ9WPn1yllI61wioc=
X-Google-Smtp-Source: APXvYqxbX+ayXNkdP6k44md/pT8nrhFtHBSaEzkkCPH81mlmzZTRx6p6ahT0UpkDkaHlJ0dscus59yKSsnj6Jdbdq3k=
X-Received: by 2002:ac2:4243:: with SMTP id m3mr10117473lfl.9.1560995924156; Wed, 19 Jun 2019 18:58:44 -0700 (PDT)
MIME-Version: 1.0
References: <155933149484.6565.7386019489022348116@ietfa.amsl.com> <CA+RyBmXu-F0cWDkBydE_aJaVpUv=k1otqUCc7NdRW4pnBK3tgA@mail.gmail.com> <14822B96-D3C6-495E-8661-198068F72ABA@cisco.com> <CA+RyBmUMbW=B3FNmqiQNmMLM27f9G+MeRL5MrAnCd04EP3vmrQ@mail.gmail.com> <8237FE8D-937E-4BCB-B1A3-89C2B3CDC51C@cisco.com>
In-Reply-To: <8237FE8D-937E-4BCB-B1A3-89C2B3CDC51C@cisco.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Thu, 20 Jun 2019 10:58:32 +0900
Message-ID: <CA+RyBmXAQ1esWKa8C6cgnn93YTyTh=JFUt187TS56bNND1OJOA@mail.gmail.com>
Subject: Re: Tsvart last call review of draft-ietf-bfd-vxlan-07
To: "Carlos Pignataro (cpignata)" <cpignata@cisco.com>
Cc: "tsv-art@ietf.org" <tsv-art@ietf.org>, rtg-bfd WG <rtg-bfd@ietf.org>, "draft-ietf-bfd-vxlan.all@ietf.org" <draft-ietf-bfd-vxlan.all@ietf.org>, Olivier Bonaventure <Olivier.Bonaventure@uclouvain.be>, IETF list <ietf@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000008e2644058bb7af93"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/AFmhkpgkBXwMjMOLZ-BgbmHfrxw>
X-Mailman-Approved-At: Wed, 19 Jun 2019 19:43:59 -0700
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Jun 2019 01:58:52 -0000

Hello Carlos,
thank you for the expedient clarification.
To your questions on demultiplexing BFD control packets with the zero value
of the Your Discriminator field:

   - only BFD control packets with the zero value of the Your Discriminator
   field are demultiplexed using the information of the inner IP header. I
   believe that the text is clear and requires that all fields of the inner IP
   header must be used to demultiplex a received BFD control packet with the
   zero value in the Your Discriminator field. Which of the fields an
   implementation uses to create multiple BFD sessions between the pair of
   VTEPs is implementation specific.

To your point on the level Echo mode of BFD is specified in RFC 5880 I'll
quote the opinion of Jeffrey Haas from the discussion of comments from
Shawn Emery on behalf of the SecDir. Shawn had commented:

Echo BFD is out of scope for the document, but does not describe the reason
for this or why state

this at all?

I've responded:

GIM>> I think that the main reason is that the BFD Echo mode is
underspecified. RFC 5880 defined some of the mechanisms related to the Echo
mode, but more standardization work may be required.

And Jeffrey Haas had added:

Speaking as a BFD chair, this is the relevant observation.  BFD Echo is
underspecified to the point where claiming compliance is difficult at best.
In general, it relies on single-hop and the ability to have the remote Echo
client loop the packets.

This packet loop may not be practical for several encapsulations and thus is
out of scope for such encapsulations.  Whether this is practical for vxlan
today, or in the presence of future extensions to vxlan is left out of scope
for the core proposal.

Will respond to other questions in a separate mail.

Regards,
Greg


On Thu, Jun 20, 2019 at 10:31 AM Carlos Pignataro (cpignata) <
cpignata@cisco.com>; wrote:

> Hello, Greg,
>
> > On Jun 19, 2019, at 9:09 PM, Greg Mirsky <gregimirsky@gmail.com>; wrote:
> >
> > Hi Carlos,
> > thank you for reminding of our continued discussion with Joel. We are
> seeking comments from VXLAN experts and much appreciate if you have
> insights on VXLAN to share.
> > I've got some clarifying questions before I can respond to you.
>
> Sure.
>
> > To which stage of the three-way handshake you refer as "initial
> demultiplexing"? I couldn't find this term in RFC 5880.
>
> “Initial demultiplexing" is a well-known term in BFD, referring to the
> "demultiplexing of the initial packets", BFD Control packet with YourDisc
> being zero.
>
> In RFC 5880, see Section 6.3.
> https://tools.ietf.org/html/rfc5880#section-6.3
>
>    The method of demultiplexing the initial packets (in which Your
>    Discriminator is zero) is application dependent, and is thus outside
>    the scope of this specification.
>
> Since initial demultiplexing is indeed application specific, different for
> one-hop versus multi-hop and dependent upon whether a single or multiple
> sessions are allowed between a pair of endpoints, I added below two other
> relevant citations, from application specific BFD specs:
> 1. https://tools.ietf.org/html/rfc5883#section-4
> 2. https://tools.ietf.org/html/rfc5882#section-6
>
>
> > Regarding the applicability of the Echo mode, thank you for pointing to
> the need for stricter terminology, the Echo mode, as defined in RFC 5880,
> is underspecified and it will require additional standardization.
>
> No. BFD Echo is not underspecified in RFC 5880.
>
> Please read S5: https://tools.ietf.org/html/rfc5880#section-5
>
>    BFD Echo packets are sent in an encapsulation appropriate to the
>    environment.  See the appropriate application documents for the
>    specifics of particular environments.
>
>
> BFD Echo is application dependent.
>
> Therefore, for example, single-hop BFD in RFC 5881 specifies BFD Echo for
> that application.
>
> Hence, my question stands: why is this draft claiming BFD Echo is out of
> scope for this BFD application document?
>
>
> > Future drafts may explore and define how the Echo mode of BFD is used
> over VXLAN tunnels.
> >
>
> See above.
>
> > Will review and respond to the remaining questions soon.
>
> Thank you.
>
> The "remaining questions" are still all the questions below :-)
>
> Best,
>
> Carlos.
>
> >
> > Regards,
> > Greg
> >
> >
> > On Thu, Jun 20, 2019 at 9:14 AM Carlos Pignataro (cpignata) <
> cpignata@cisco.com>; wrote:
> > Hi,
> >
> > I have not reviewed this draft before, but triggered by this email, and
> briefly scanning through a couple of sections, it is unclear to me how some
> of the mechanics work.
> >
> > There are some major issues with the Mac usage and association, as Joel
> Halpern mentioned in his Rtg Dir review.
> >
> > And, additionally, please consider the following comments and questions:
> >
> >
> > 1. Underspecification for initialization and initial demultiplexing.
> >
> > This document allows multiple BFD sessions between a single pair of
> VTEPs:
> >
> >    An
> >    implementation that supports this specification MUST be able to
> >    control the number of BFD sessions that can be created between the
> >    same pair of VTEPs.
> >
> > The implication of this is that BFD single-hop initialization procedures
> will not work. Instead, there is a need to map the initial demultiplexing.
> >
> > This issue is explained in RFCs 5882 and 5883:
> https://tools.ietf.org/html/rfc5883#section-4 and
> https://tools.ietf.org/html/rfc5882#section-6
> >
> > Section 5.1 says:
> >
> >    For such packets, the BFD session MUST be identified
> >    using the inner headers, i.e., the source IP, the destination IP, and
> >    the source UDP port number present in the IP header carried by the
> >    payload of the VXLAN encapsulated packet.  The VNI of the packet
> >    SHOULD be used to derive interface-related information for
> >    demultiplexing the packet.
> >
> > But this does not really explain how to do the initial demultiplexing.
> Does each BFD session need to have a separate inner source IP address? Or
> source UDP port? And how ofter are they recycled or kept as state? How are
> these mapped?
> > Equally importantly, which side is Active?
> > And what if there’s a race condition with both sides being Active and
> setting up redundant sessions?
> >
> > 1.b. By the way, based on this, using S-BFD [RFC 7880] might be easier
> to demux.
> >
> >
> > 2. Security
> >
> > This document says that the TTL in the inner packet carrying BFD is set
> to 1. However, RFC 5880 says to use GTSM [RFC 5082], i.e., a value of 255..
> >
> > Why is GTSM not used here?
> >
> >
> > 3. ECMP and fate-sharing under-specification:
> >
> > Section 4.1. says:
> >
> >    The Outer IP/UDP
> >    and VXLAN headers MUST be encoded by the sender as defined in
> >    [RFC7348].
> >
> >
> > And RFC 7348 says:
> >
> >       -  Source Port:  It is recommended that the UDP source port number
> >          be calculated using a hash of fields from the inner packet --
> >          one example being a hash of the inner Ethernet frame's headers.
> >          This is to enable a level of entropy for the ECMP/load-
> >          balancing of the VM-to-VM traffic across the VXLAN overlay.
> >          When calculating the UDP source port number in this manner, it
> >          is RECOMMENDED that the value be in the dynamic/private port
> >          range 49152-65535 [RFC6335].
> >
> >
> > Based on this, depending on the hashing calculation, the outer source
> UDP port can be different leading to different ECMP treatment. Does
> something else need to be specified here in regards to the outer UDP source
> port?
> >
> >
> > 4. Section 7 says that “ Support for echo BFD is outside the scope of
> this document”.
> >
> > Assuming this means “BFD Echo mode”, why is this out of scope? If this
> is a single logical hop underneath VXLAN, what’s preventing the use of
> Echo? Echo’s benefits are huge.
> >
> >
> > 5. Terminology
> >
> >    Implementations SHOULD ensure that the BFD
> >    packets follow the same lookup path as VXLAN data packets within the
> >    sender system.
> >
> > What is a “look up path within a sender system”?
> >
> >
> > 6. Deployment scenarios
> >
> > S3 says:
> >    Figure 1 illustrates the scenario with two servers, each of them
> >    hosting two VMs.  The servers host VTEPs that terminate two VXLAN
> > […]
> >                      Figure 1: Reference VXLAN Domain
> >
> >
> > However, RFC 7348 Figure 3 lists that as one deployment scenario, not as
> “the scenario” and “The Reference VXLAN Domain”.
> >
> > Best,
> >
> > Carlos.
> >
> >> On Jun 17, 2019, at 12:58 AM, Greg Mirsky <gregimirsky@gmail.com>;
> wrote:
> >>
> >> Hi Oliver,
> >> thank you for your thorough review, clear and detailed questions. My
> apologies for the delay to respond. Please find my answers below in-line
> tagged GIM>>.
> >>
> >> Regards,
> >> Greg
> >>
> >> On Fri, May 31, 2019 at 12:38 PM Olivier Bonaventure via Datatracker <
> noreply@ietf.org>; wrote:
> >> Reviewer: Olivier Bonaventure
> >> Review result: Ready with Issues
> >>
> >> This document has been reviewed as part of the transport area review
> team's
> >> ongoing effort to review key IETF documents. These comments were written
> >> primarily for the transport area directors, but are copied to the
> document's
> >> authors and WG to allow them to address any issues raised and also to
> the IETF
> >> discussion list for information.
> >>
> >> When done at the time of IETF Last Call, the authors should consider
> this
> >> review as part of the last-call comments they receive. Please always CC
> >> tsv-art@ietf.org if you reply to or forward this review.
> >>
> >> I have only limited knowledge of VXLAN and do not know all subtleties
> of BFD.
> >> This review is thus more from a generalist than a specialist in this
> topic.
> >>
> >> Major issues
> >>
> >> Section 4 requires that " Implementations SHOULD ensure that the BFD
> >>    packets follow the same lookup path as VXLAN data packets within the
> >>    sender system."
> >>
> >> Why is this requirement only relevant for the lookup path on the sender
> system
> >> ? What does this sentence really implies ?
> >> GIM>> RFC 5880 set the scope of the fault detection of BFD protocol as
> >>    ... the bidirectional path between two forwarding engines, including
> >>    interfaces, data link(s), and to the extent possible the forwarding
> >>    engines themselves ...
> >> The requirement aimed to the forwarding engine of a BFD system that
> transmits BFD control packets over VXLAN tunnel.
> >>
> >> Is it a requirement that the BFD packets follow the same path as the
> data
> >> packet for a given VXLAN ? I guess so. In this case, the document should
> >> discuss how Equal Cost Multipath could affect this.
> >> GIM>> I think that ECMP environment is more likely to be experienced by
> a transit node in the underlay. If the BFD session is used to monitor the
> specific underlay path, then, I agree, we should explain that using the
> VXLAN payload information to draw path entropy may cause data and BFD
> packets following different underlay routes. But, on the other hand, that
> is the case for OAM and fault detection in all overlay networks in general.
> >>
> >> Minor issues
> >>
> >> Section 1
> >>
> >> You write "The asynchronous mode of BFD, as defined in [RFC5880],
> >>  can be used to monitor a p2p VXLAN tunnel."
> >>
> >> Why do you use the word can ? It is a possibility or a requirement ?
> >> GIM>> In principle, BFD Demand mode may be used to monitor p2p paths as
> well, I agree, will re-word to more assertive:
> >>  The asynchronous mode of BFD, as defined in [RFC5880],
> >>  is used to monitor a p2p VXLAN tunnel.
> >>
> >> NVE has not been defined before and is not in the terminology.
> >> GIM>> Will add to the Terminology and expand as:
> >> NVE        Network Virtualization Endpoint
> >>
> >> This entire section is not easy to read for an outsider.
> >>
> >> Section 3
> >>
> >> VNI has not been defined
> >> GIM>> Will add to the Terminology section:
> >> VNI    VXLAN Network Identifier (or VXLAN Segment ID)
> >>
> >> Figure 1 could take less space
> >> GIM>> Yes, can make it bit denser. Would the following be an
> improvement?
> >>
> >>       +------------+-------------+
> >>       |        Server 1          |
> >>       | +----+----+  +----+----+ |
> >>       | |VM1-1    |  |VM1-2    | |
> >>       | |VNI 100  |  |VNI 200  | |
> >>       | |         |  |         | |
> >>       | +---------+  +---------+ |
> >>       | Hypervisor VTEP (IP1)    |
> >>       +--------------------------+
> >>                             |
> >>                             |   +-------------+
> >>                             |   |   Layer 3   |
> >>                             +---|   Network   |
> >>                                 +-------------+
> >>                                     |
> >>                                     +-----------+
> >>                                                 |
> >>                                          +------------+-------------+
> >>                                          |    Hypervisor VTEP (IP2) |
> >>                                          | +----+----+  +----+----+ |
> >>                                          | |VM2-1    |  |VM2-2    | |
> >>                                          | |VNI 100  |  |VNI 200  | |
> >>                                          | |         |  |         | |
> >>                                          | +---------+  +---------+ |
> >>                                          |      Server 2            |
> >>                                          +--------------------------+
> >>
> >>
> >> Section 4
> >>
> >> I do not see the benefits of having one paragraph in Section 4 followed
> by only
> >> Section 4.1
> >> GIM>> Will merge Section 4.1 into 4 with minor required re-wording:
> >> 4.  BFD Packet Transmission over VXLAN Tunnel
> >>
> >>    BFD packet MUST be encapsulated and sent to a remote VTEP as
> >>    explained in this section.  Implementations SHOULD ensure that the
> >>    BFD packets follow the same lookup path as VXLAN data packets within
> >>    the sender system.
> >>
> >>    BFD packets are encapsulated in VXLAN as described below.  The VXLAN
> >>    packet format is defined in Section 5 of [RFC7348].  The Outer IP/UDP
> >>    and VXLAN headers MUST be encoded by the sender as defined in
> >>    [RFC7348].
> >>
> >> Section 4.1
> >>
> >> The document does not specify when a dedicated MAC address or the MAC
> address
> >> of the destination VTEP must be used. This could affect the
> interoperability of
> >> implementations. Should all implementations support both the dedicated
> MAC
> >> address and the destination MAC address ?
> >> GIM>> After further discussion, authors decided to remove the request
> for the dedicated MAC address allocation. Only the MAC address of the
> remote VTEP must be used as the destination MAC address in the inner
> Ethernet frame. Please check the attached diff between the -07 and the
> working versions or the working version of the draft.
> >>
> >> It is unclear from this section whether IPv4 inside IPv6 and the
> opposite
> >> should be supported or not.
> >> GIM>> Any combination of outer IPvX and inner IPvX is possible.
> >>
> >> Section 5.
> >>
> >> If the received packet does not match the dedicated MAC address nor the
> MAC
> >> address of the VTEP, should the packet be silently discarded or treated
> >> differently ?
> >> GIM>> As I've mentioned earlier, authors have decided to remove the use
> of the dedicated MAC address for BFD over VXLAN.
> >>
> >> Section 5.1
> >>
> >> Is this a modification to section 6.3 of RFC5880 ? This is not clear
> >> GIM>> I think that this section is not modification but the definition
> of the application-specific procedure that is outside the scope of RFC 5880:
> >>    The method of demultiplexing the initial packets (in which Your
> >>    Discriminator is zero) is application dependent, and is thus outside
> >>    the scope of this specification.
> >>
> >> Section 9
> >>
> >> The sentence " Throttling MAY be relaxed for BFD packets
> >>    based on port number." is unclear.
> >> GIM>> Yes, thank you for pointing to this. The updated text, in the
> whole paragraph, is as follows:
> >> NEW TEXT:
> >>    The document requires setting the inner IP TTL to 1, which could be
> >>    used as a DDoS attack vector.  Thus the implementation MUST have
> >>    throttling in place to control the rate of BFD control packets sent
> >>    to the control plane.  On the other hand, over aggressive throttling
> >>    of BFD control packets may become the cause of the inability to form
> >>    and maintain BFD session at scale.  Hence, throttling of BFD control
> >>    packets SHOULD be adjusted to permit BFD to work according to its
> >>    procedures.
> >> <draft-ietf-bfd-vxlan-08.txt><Diff_ draft-ietf-bfd-vxlan-07.txt -
> draft-ietf-bfd-vxlan-08.txt.html>
> >
>
>