Re: WGLC comments on draft-ietf-bfd-vxlan

Greg Mirsky <gregimirsky@gmail.com> Thu, 22 November 2018 00:36 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1179B130E37; Wed, 21 Nov 2018 16:36:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.017
X-Spam-Level:
X-Spam-Status: No, score=-1.017 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_FONT_FACE_BAD=0.981, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id d0BmwHZtcYVh; Wed, 21 Nov 2018 16:36:33 -0800 (PST)
Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D83E6130DE5; Wed, 21 Nov 2018 16:36:32 -0800 (PST)
Received: by mail-lf1-x131.google.com with SMTP id u18so5296115lff.10; Wed, 21 Nov 2018 16:36:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0LMGN4CfJ6YBvTgGx6d4/ibnfPZfFj8obaJconUMX+Y=; b=oWqVby43NCHdrNsuFm51AA8Jy8BYRhLKTl4FHuLmIanRuDO8M0UssK/t9ckOkk2jit zl7ee9j3BqHyc/tEWXTuV9pztejYKhw4ArLL8H96hQfZ93EBiBOSLg0ImqxZsqY1QL3x sxIr4mqq0GnSDSQO7JJhmllKqXDrhzU1QzPly+mSt/V9Us11BKU4WjluhyxujFGaFMnV ZCj6CPwzpsGth4FB3mF9pFrwB0hN3pCueTAN3Haj7hGNk0YHHIDtJeXVIyUsGigSYYnM MVQjfrZ/6/7Buk9Ya5adAekD+GWCM3NwdE7fdk+JpmNAf4qN1HdUT9mKByV7Ke6h1PYv P1SQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0LMGN4CfJ6YBvTgGx6d4/ibnfPZfFj8obaJconUMX+Y=; b=bLjxVMsL9nd7WuxfhtjEIcG/zdd44p05rxl3rE3tB3uwDMExOidEeR7UOS3YHYCTMz s7ChRiWEdvT3ZSljQSTdSWIquy/DDPW+BQRL6MNeTy223FKHddTYvdZTEMMmx6YQcdwQ rBLLxXjT65Z/QycR0N3OirNrCZBZwV1qext1Ve3jAUJqsv2cz2LKbynuiRYsTE/nrjCY LuBFxtCl2ZvNtkObBIviqCSY2CETv+SF7c6/kg80LTj8JxC1hpC++vHw9iMc3S7E0ODz Rtoep1mUeBLW9hQWJZwfgPnyUXIzHOUGI7CrdL30TadHII14g8RODM4QimZxC4kS0poV dQzA==
X-Gm-Message-State: AGRZ1gKLtlXelDZlX9mfolPx/FlHKG2gVa/jtfkm43jD7PYxhGVmd4LK QgaSIE2qaMXqt9/3zx3RZ9d9tGH0jyF7Pv4gdFswkieS
X-Google-Smtp-Source: AJdET5f0dMqk7GSLZCM/Ukqk6uJLSRl8NMiAkPlt0YRPpXzoS3Fokn75la/aEKTO+0qiBnxfoPB6KGfSgWZlLB6wUr8=
X-Received: by 2002:a19:c014:: with SMTP id q20mr4764720lff.16.1542846990748; Wed, 21 Nov 2018 16:36:30 -0800 (PST)
MIME-Version: 1.0
References: <CA+-tSzxFxtVo6NbfSw4wzb--fSuN4zsSvX7R58iiYFgVF5cA6Q@mail.gmail.com> <CA+RyBmVXeCYAZhWTy-g6U_EJ7NOFQwV4twJaJ-7_LT5_wKFGFw@mail.gmail.com> <CA+-tSzxQp2x0hpAF253b9yKL1aD1J1CaGHs7T6VE8zuvg25R_Q@mail.gmail.com> <CA+RyBmXoOKS-Nq7bDfsgDZXou5-FcprEQeVkhWhAD4_1MoHqUQ@mail.gmail.com> <CA+-tSzzgKyfXzE+=eVLz7B3u1X_HFahQ6GCFTbL+-rfjsR03uA@mail.gmail.com> <CA+RyBmVeyOhBNANTfG87VbNkwh5HqxZnFc7AzFcCLo_6UcHSMQ@mail.gmail.com> <CA+-tSzyCKsQx9zTMjjTpwjF=tL2WOz7hNUff_KFQwL8n2Y+xUg@mail.gmail.com>
In-Reply-To: <CA+-tSzyCKsQx9zTMjjTpwjF=tL2WOz7hNUff_KFQwL8n2Y+xUg@mail.gmail.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Wed, 21 Nov 2018 16:36:18 -0800
Message-ID: <CA+RyBmVCbz7yw=97QVek5RM89PfqkcBijCNE8tPWdFdfgrvX3w@mail.gmail.com>
Subject: Re: WGLC comments on draft-ietf-bfd-vxlan
To: Anoop Ghanwani <anoop@alumni.duke.edu>
Cc: rtg-bfd WG <rtg-bfd@ietf.org>, NVO3 <nvo3@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000d3916e057b360ec3"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/7apf9TYPVFbFJK_Yq8slHHhHhUs>
X-Mailman-Approved-At: Wed, 21 Nov 2018 18:42:03 -0800
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Nov 2018 00:36:36 -0000

Hi Anoop,
apologies for the miss. Is it the last outstanding? Let's bring it to the
front then.

- What is the benefit of running BFD per VNI between a pair of VTEPs?
>>>>
>>> GIM2>> An alternative would be to run CFM between VMs, if there's the
>>> need to monitor liveliness of the particular VM. Again, this is optional.
>>>
>>
>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one
>> to monitor the liveliness of VMs.
>>
>
[ag3] I think you missed responding to this.  I'm not sure of the value of
running BFD per VNI between VTEPs.  What am I getting that is not covered
by running a single BFD session with VNI 0 between the VTEPs?

GIM3>> I've misspoken. Non-zero VNI is recommended to be used to
demultiplex BFD sessions between the same VTEPs. In section 6.1:
   The procedure for demultiplexing
   packets with Your Discriminator equal to 0 is different from
   [RFC5880].  For such packets, the BFD session MUST be identified
   using the inner headers, i.e., the source IP and the destination IP
   present in the IP header carried by the payload of the VXLAN
   encapsulated packet.  The VNI of the packet SHOULD be used to derive
   interface-related information for demultiplexing the packet.

Hope that clarifies the use of non-zero VNI in VXLAN encapsulation of a BFD
control packet.

Regards,
Greg

On Tue, Nov 20, 2018 at 12:14 PM Anoop Ghanwani <anoop@alumni.duke.edu>
wrote:

> Hi Greg,
>
> Please see inline prefixed by [ag3].
>
> Thanks,
> Anoop
>
> On Fri, Nov 16, 2018 at 5:29 PM Greg Mirsky <gregimirsky@gmail.com> wrote:
>
>> Hi Anoop,
>> thank you for the discussion. Please find my responses tagged GIM3>>.
>> Also, attached diff and the updated working version of the draft. Hope
>> we're converging.
>>
>> Regards,
>> Greg
>>
>> On Wed, Nov 14, 2018 at 11:00 PM Anoop Ghanwani <anoop@alumni.duke.edu>
>> wrote:
>>
>>> Hi Greg,
>>>
>>> Please see inline prefixed with [ag2].
>>>
>>> Thanks,
>>> Anoop
>>>
>>> On Wed, Nov 14, 2018 at 9:45 AM Greg Mirsky <gregimirsky@gmail.com>
>>> wrote:
>>>
>>>> Hi Anoop,
>>>> thank you for the expedient response. I am glad that some of my
>>>> responses have addressed your concerns. Please find followup notes in-line
>>>> tagged GIM2>>. I've attached the diff to highlight the updates applied in
>>>> the working version. Let me know if these are acceptable changes.
>>>>
>>>> Regards,
>>>> Greg
>>>>
>>>> On Tue, Nov 13, 2018 at 12:30 PM Anoop Ghanwani <anoop@alumni.duke.edu>
>>>> wrote:
>>>>
>>>>> Hi Greg,
>>>>>
>>>>> Please see inline prefixed with [ag].
>>>>>
>>>>> Thanks,
>>>>> Anoop
>>>>>
>>>>> On Tue, Nov 13, 2018 at 11:34 AM Greg Mirsky <gregimirsky@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Anoop,
>>>>>> many thanks for the thorough review and detailed comments. Please
>>>>>> find my answers, this time for real, in-line tagged GIM>>.
>>>>>>
>>>>>> Regards,
>>>>>> Greg
>>>>>>
>>>>>> On Thu, Nov 8, 2018 at 1:58 AM Anoop Ghanwani <anoop@alumni.duke.edu>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> Here are my comments.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Anoop
>>>>>>>
>>>>>>> ==
>>>>>>>
>>>>>>> Philosophical
>>>>>>>
>>>>>>> Since VXLAN is not an IETF standard, should we be defining a
>>>>>>> standard for running BFD on it?  Should we define BFD over Geneve instead
>>>>>>> which is the official WG selection?  Is that going to be a separate
>>>>>>> document?
>>>>>>> GIM>> IS-IS is not on the Standard track either but that had not
>>>>>>> prevented IETF from developing tens of standard track RFCs using RFC 1142
>>>>>>> as the normative reference until RFC 7142 re-classified it as historical. A
>>>>>>> similar path was followed with IS-IS-TE by publishing RFC 3784 until it was
>>>>>>> obsoleted by RFC 5305 four years later. I understand that Down Reference,
>>>>>>> i.e., using informational RFC as the normative reference, is not an unusual
>>>>>>> situation.
>>>>>>>
>>>>>>
>>>>> [ag] OK.  I'm not an expert on this part so unless someone else that
>>>>> is an expert (chairs, AD?) can comment on it, I'll just let it go.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Technical
>>>>>>>
>>>>>>> Section 1:
>>>>>>>
>>>>>>> This part needs to be rewritten:
>>>>>>> >>>
>>>>>>> The individual racks may be part of a different Layer 3 network, or
>>>>>>> they could be in a single Layer 2 network. The VXLAN segments/overlays are
>>>>>>> overlaid on top of Layer 3 network. A VM can communicate with another VM
>>>>>>> only if they are on the same VXLAN segment.
>>>>>>> >>>
>>>>>>> It's hard to parse and, given IRB,
>>>>>>>
>>>>>> GIM>> Would the following text be acceptable:
>>>>>> OLD TEXT:
>>>>>>    VXLAN is typically deployed in data centers interconnecting
>>>>>>    virtualized hosts, which may be spread across multiple racks.  The
>>>>>>    individual racks may be part of a different Layer 3 network, or
>>>>>> they
>>>>>>    could be in a single Layer 2 network.  The VXLAN segments/overlays
>>>>>>    are overlaid on top of Layer 3 network.
>>>>>> NEW TEXT:
>>>>>> VXLAN is typically deployed in data centers interconnecting
>>>>>> virtualized
>>>>>> hosts of a tenant. VXLAN addresses requirements of the Layer 2 and
>>>>>> Layer 3 data center network infrastructure in the presence of VMs in
>>>>>> a multi-tenant environment, discussed in section 3 [RFC7348], by
>>>>>>  providing Layer 2 overlay scheme on a Layer 3 network.
>>>>>>
>>>>>
>>>>> [ag] This is a lot better.
>>>>>
>>>>>
>>>>>>
>>>>>>  A VM can communicate with another VM only if they are on the same
>>>>>> VXLAN segment.
>>>>>>>
>>>>>>> the last sentence above is wrong.
>>>>>>>
>>>>>> GIM>> Section 4 in RFC 7348 states:
>>>>>> Only VMs within the same VXLAN segment can communicate with each
>>>>>> other.
>>>>>>
>>>>>
>>>>> [ag] VMs on different segments can communicate using routing/IRB, so
>>>>> even RFC 7348 is wrong.  Perhaps the text should be modified so say -- "In
>>>>> the absence of a router in the overlay, a VM can communicate...".
>>>>>
>>>>>
>>>>>>
>>>>>> Section 3:
>>>>>>> >>>
>>>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>>> may not support L3.
>>>>>>> >>>
>>>>>>> Are you suggesting most deployments have VMs with no IP
>>>>>>> addresses/configuration?
>>>>>>>
>>>>>> GIM>> Would re-word as follows:
>>>>>> OLD TEXT:
>>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>>  may not support L3.
>>>>>> NEW TEXT:
>>>>>> Deployments may have VMs with only L2 capabilities that do not
>>>>>> support L3.
>>>>>>
>>>>>
>>>>> [ag] I still don't understand this.  What does it mean for a VM to not
>>>>> support L3?  No IP address, no default GW, something else?
>>>>>
>>>> GIM2>> VM communicates with its VTEP which, in turn, originates VXLAN
>>>> tunnel. VM is not required to have IP address as it is VTEP's IP address
>>>> that VM's MAC is associated with. As for gateway, RFC 7348 discusses VXLAN
>>>> gateway as the device that forwards traffice between VXLAN and non-VXLAN
>>>> domains. Considering all that, would the following change be acceptable:
>>>> OLD TEXT:
>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>  may not support L3.
>>>> NEW TEXT:
>>>>  Most deployments will have VMs with only L2 capabilities and not have
>>>> an IP address assigned.
>>>>
>>>
>>> [ag2] Do you have a reference for this (i.e. that most deployments have
>>> VMs without an IP address)?  Normally I would think VMs would have an IP
>>> address.  It's just that they are segregated into segments and, without an
>>> intervening router, they are restricted to communicate only within their
>>> subnet.
>>>
>> GIM3>> Would the following text be acceptable:
>>
>> Deployments might have VMs with only L2 capabilities and not have an IP
>> address assigned or,
>> in other cases, VMs are assigned IP address but are restricted to
>> communicate only within their subnet.
>>
>>
> [ag3] Yes, this is better.
>
>
>>>>>
>>>>>>
>>>>>>> >>>
>>>>>>> Having a hierarchical OAM model helps localize faults though it
>>>>>>> requires additional consideration.
>>>>>>> >>>
>>>>>>> What are the additional considerations?
>>>>>>>
>>>>>> GIM>> For example, coordination of BFD intervals across the OAM
>>>>>> layers.
>>>>>>
>>>>>
>>>>> [ag] Can we mention them in the draft?
>>>>>
>>>>>
>>>>>>
>>>>>>> Would be useful to add a reference to RFC 8293 in case the reader
>>>>>>> would like to know more about service nodes.
>>>>>>>
>>>>>> GIM>> I have to admit that I don't find how RFC 8293  A Framework for
>>>>>> Multicast in Network Virtualization over Layer 3 is related to this
>>>>>> document. Please help with additional reference to the text of the
>>>>>> document.
>>>>>>
>>>>>
>>>>> [ag] The RFC discusses the use of service nodes which is mentioned
>>>>> here.
>>>>>
>>>>>
>>>>>>
>>>>>>> Section 4
>>>>>>> >>>
>>>>>>> Separate BFD sessions can be established between the VTEPs (IP1 and
>>>>>>> IP2) for monitoring each of the VXLAN tunnels (VNI 100 and 200).
>>>>>>> >>>
>>>>>>> IMO, the document should mention that this could lead to scaling
>>>>>>> issues given that VTEPs can support well in excess of 4K VNIs.
>>>>>>> Additionally, we should mention that with IRB, a given VNI may not even
>>>>>>> exist on the destination VTEP.  Finally, what is the benefit of doing
>>>>>>> this?  There may be certain corner cases where it's useful (vs a single BFD
>>>>>>> session between the VTEPs for all VNIs) but it would be good to explain
>>>>>>> what those are.
>>>>>>>
>>>>>> GIM>> Will add text in the Security Considerations section that VTEPs
>>>>>> should have limit on number of BFD sessions.
>>>>>>
>>>>>
>>>>> [ag] I was hoping for two things:
>>>>> - A mention about the scalability issue right where per-VNI BFD is
>>>>> discussed.  (Not sure why that is a security issue/consideration.)
>>>>>
>>>> GIM2>> I've added the following sentense in both places:
>>>> The implementation SHOULD have a reasonable upper bound on the number
>>>> of BFD sessions that can be created between the same pair of VTEPs.
>>>>
>>>
>>> [ag2] What is the criteria for determining what is reasonable?
>>>
>> GIM>> I usually understand that as requirement to make it controllable,
>> have configurable limit. Thus it will be up to an network operator to set
>> the limit.
>>
>>>
>>>
>>>> - What is the benefit of running BFD per VNI between a pair of VTEPs?
>>>>>
>>>> GIM2>> An alternative would be to run CFM between VMs, if there's the
>>>> need to monitor liveliness of the particular VM. Again, this is optional.
>>>>
>>>
>>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one
>>> to monitor the liveliness of VMs.
>>>
>>
> [ag3] I think you missed responding to this.  I'm not sure of the value of
> running BFD per VNI between VTEPs.  What am I getting that is not covered
> by running a single BFD session with VNI 0 between the VTEPs?
>
>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> Sections 5.1 and 6.1
>>>>>>>
>>>>>>> In 5.1 we have
>>>>>>> >>>
>>>>>>> The inner MAC frame carrying the BFD payload has the
>>>>>>> following format:
>>>>>>> ... Source IP: IP address of the originating VTEP. Destination IP:
>>>>>>> IP address of the terminating VTEP.
>>>>>>> >>>
>>>>>>>
>>>>>>> In 6.1 we have
>>>>>>> >>>
>>>>>>>
>>>>>>> Since multiple BFD sessions may be running between two
>>>>>>> VTEPs, there needs to be a mechanism for demultiplexing received BF
>>>>>>>
>>>>>>> packets to the proper session.  The procedure for demultiplexing
>>>>>>> packets with Your Discriminator equal to 0 is different from[RFC5880 <https://tools.ietf.org/html/rfc5880>].
>>>>>>>
>>>>>>> *For such packets, the BFD session MUST be identified*
>>>>>>>
>>>>>>> *using the inner headers, i.e., the source IP and the destination IP
>>>>>>> present in the IP header carried by the payload of the VXLAN*
>>>>>>>
>>>>>>> *encapsulated packet.*
>>>>>>>
>>>>>>>
>>>>>>> >>>
>>>>>>> How does this work if the source IP and dest IP are the same as
>>>>>>> specified in 5.1?
>>>>>>>
>>>>>> GIM>> You're right, Destination and source IP addresses likely are
>>>>>> the same in this case. Will add that the source UDP port number, along with
>>>>>> the pair of IP addresses, MUST be used to demux received BFD control
>>>>>> packets. Would you agree that will be sufficient?
>>>>>>
>>>>>
>>>>> [ag] Yes, I think that should work.
>>>>>
>>>>>>
>>>>>>> Editorial
>>>>>>>
>>>>>>
>>>>> [ag] Agree with all comments on this section.
>>>>>
>>>>>>
>>>>>>> - Terminology section should be renamed to acronyms.
>>>>>>>
>>>>>> GIM>> Accepted
>>>>>>
>>>>>>> - Document would benefit from a thorough editorial scrub, but maybe
>>>>>>> that will happen once it gets to the RFC editor.
>>>>>>>
>>>>>> GIM>> Will certainly have helpful comments from ADs and RFC editor.
>>>>>>
>>>>>>>
>>>>>>> Section 1
>>>>>>> >>>
>>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348
>>>>>>> <https://tools.ietf.org/html/rfc7348>]. provides an encapsulation
>>>>>>> scheme that allows virtual machines (VMs) to communicate in a data center
>>>>>>> network.
>>>>>>> >>>
>>>>>>> This is not accurate.  VXLAN allows you to implement an overlay to
>>>>>>> decouple the address space of the attached hosts from that of the network.
>>>>>>>
>>>>>> GIM>> Thank you for the suggested text. Will change as follows:
>>>>>> OLD TEXT:
>>>>>>    "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348].
>>>>>> provides
>>>>>>    an encapsulation scheme that allows virtual machines (VMs) to
>>>>>>    communicate in a data center network.
>>>>>> NEW TEXT:
>>>>>>  "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348].  provides
>>>>>>    an encapsulation scheme that allows building an overlay network by
>>>>>>   decoupling the address space of the attached virtual hosts from
>>>>>> that of the network.
>>>>>>
>>>>>>>
>>>>>>> Section 7
>>>>>>>
>>>>>>> VTEP's -> VTEPs
>>>>>>>
>>>>>> GIM>> Yes, thank you.
>>>>>>
>>>>>