Re: WGLC comments on draft-ietf-bfd-vxlan

Anoop Ghanwani <anoop@alumni.duke.edu> Thu, 22 November 2018 07:00 UTC

MIME-Version: 1.0
References: <CA+-tSzxFxtVo6NbfSw4wzb--fSuN4zsSvX7R58iiYFgVF5cA6Q@mail.gmail.com> <CA+RyBmVXeCYAZhWTy-g6U_EJ7NOFQwV4twJaJ-7_LT5_wKFGFw@mail.gmail.com> <CA+-tSzxQp2x0hpAF253b9yKL1aD1J1CaGHs7T6VE8zuvg25R_Q@mail.gmail.com> <CA+RyBmXoOKS-Nq7bDfsgDZXou5-FcprEQeVkhWhAD4_1MoHqUQ@mail.gmail.com> <CA+-tSzzgKyfXzE+=eVLz7B3u1X_HFahQ6GCFTbL+-rfjsR03uA@mail.gmail.com> <CA+RyBmVeyOhBNANTfG87VbNkwh5HqxZnFc7AzFcCLo_6UcHSMQ@mail.gmail.com> <CA+-tSzyCKsQx9zTMjjTpwjF=tL2WOz7hNUff_KFQwL8n2Y+xUg@mail.gmail.com> <CA+RyBmVCbz7yw=97QVek5RM89PfqkcBijCNE8tPWdFdfgrvX3w@mail.gmail.com>
In-Reply-To: <CA+RyBmVCbz7yw=97QVek5RM89PfqkcBijCNE8tPWdFdfgrvX3w@mail.gmail.com>
From: Anoop Ghanwani <anoop@alumni.duke.edu>
Date: Wed, 21 Nov 2018 22:59:56 -0800
Message-ID: <CA+-tSzy=9fJmMYK3RAnZgqj5-GVBAg1RAaMbfkEbxX-=d=VxRw@mail.gmail.com>
Subject: Re: WGLC comments on draft-ietf-bfd-vxlan
To: Greg Mirsky <gregimirsky@gmail.com>
Cc: rtg-bfd@ietf.org, nvo3@ietf.org
Content-Type: multipart/alternative; boundary="000000000000e2d9b4057b3b6ae9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/khd78gMhSSkIptOvttXqSLDgODg>
Precedence: list

Hi Greg,

See below prefixed with [ag4].

Thanks,
Anoop

On Wed, Nov 21, 2018 at 4:36 PM Greg Mirsky <gregimirsky@gmail.com> wrote:

> Hi Anoop,
> apologies for the miss. Is it the last outstanding? Let's bring it to the
> front then.
>
> - What is the benefit of running BFD per VNI between a pair of VTEPs?
>>>>>
>>>> GIM2>> An alternative would be to run CFM between VMs, if there's the
>>>> need to monitor liveliness of the particular VM. Again, this is optional.
>>>>
>>>
>>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one
>>> to monitor the liveliness of VMs.
>>>
>>
> [ag3] I think you missed responding to this.  I'm not sure of the value of
> running BFD per VNI between VTEPs.  What am I getting that is not covered
> by running a single BFD session with VNI 0 between the VTEPs?
>
> GIM3>> I've misspoken. Non-zero VNI is recommended to be used to
> demultiplex BFD sessions between the same VTEPs. In section 6.1:
>    The procedure for demultiplexing
>    packets with Your Discriminator equal to 0 is different from
>    [RFC5880].  For such packets, the BFD session MUST be identified
>    using the inner headers, i.e., the source IP and the destination IP
>    present in the IP header carried by the payload of the VXLAN
>    encapsulated packet.  The VNI of the packet SHOULD be used to derive
>    interface-related information for demultiplexing the packet.
>
> Hope that clarifies the use of non-zero VNI in VXLAN encapsulation of a
> BFD control packet.
>

[ag4] This tells me how the VNI is used for BFD packets being
sent/received.  What is the use case/benefit of doing that?  I am creating
a special interface with VNI 0 just for BFD.  Why do I now need to run BFD
on any/all of the other VNIs?  As a developer, if I read this spec, should
I be building this capability or not?  Basically what I'm getting at is I
think the draft should recommend using VNI 0.  If there is a convincing use
case for running BFD over other VNIs serviced by that VTEP, then that needs
to be explained.  But as I mentioned before, this leads to scaling issues.
So given the scaling issues, it would be good if an implementation only
needed to worry about sending BFD messages on VNI 0.


>
> Regards,
> Greg
>
> On Tue, Nov 20, 2018 at 12:14 PM Anoop Ghanwani <anoop@alumni.duke.edu>
> wrote:
>
>> Hi Greg,
>>
>> Please see inline prefixed by [ag3].
>>
>> Thanks,
>> Anoop
>>
>> On Fri, Nov 16, 2018 at 5:29 PM Greg Mirsky <gregimirsky@gmail.com>
>> wrote:
>>
>>> Hi Anoop,
>>> thank you for the discussion. Please find my responses tagged GIM3>>.
>>> Also, attached diff and the updated working version of the draft. Hope
>>> we're converging.
>>>
>>> Regards,
>>> Greg
>>>
>>> On Wed, Nov 14, 2018 at 11:00 PM Anoop Ghanwani <anoop@alumni.duke.edu>
>>> wrote:
>>>
>>>> Hi Greg,
>>>>
>>>> Please see inline prefixed with [ag2].
>>>>
>>>> Thanks,
>>>> Anoop
>>>>
>>>> On Wed, Nov 14, 2018 at 9:45 AM Greg Mirsky <gregimirsky@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Anoop,
>>>>> thank you for the expedient response. I am glad that some of my
>>>>> responses have addressed your concerns. Please find followup notes in-line
>>>>> tagged GIM2>>. I've attached the diff to highlight the updates applied in
>>>>> the working version. Let me know if these are acceptable changes.
>>>>>
>>>>> Regards,
>>>>> Greg
>>>>>
>>>>> On Tue, Nov 13, 2018 at 12:30 PM Anoop Ghanwani <anoop@alumni.duke.edu>
>>>>> wrote:
>>>>>
>>>>>> Hi Greg,
>>>>>>
>>>>>> Please see inline prefixed with [ag].
>>>>>>
>>>>>> Thanks,
>>>>>> Anoop
>>>>>>
>>>>>> On Tue, Nov 13, 2018 at 11:34 AM Greg Mirsky <gregimirsky@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Anoop,
>>>>>>> many thanks for the thorough review and detailed comments. Please
>>>>>>> find my answers, this time for real, in-line tagged GIM>>.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Greg
>>>>>>>
>>>>>>> On Thu, Nov 8, 2018 at 1:58 AM Anoop Ghanwani <anoop@alumni.duke.edu>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Here are my comments.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anoop
>>>>>>>>
>>>>>>>> ==
>>>>>>>>
>>>>>>>> Philosophical
>>>>>>>>
>>>>>>>> Since VXLAN is not an IETF standard, should we be defining a
>>>>>>>> standard for running BFD on it?  Should we define BFD over Geneve instead
>>>>>>>> which is the official WG selection?  Is that going to be a separate
>>>>>>>> document?
>>>>>>>> GIM>> IS-IS is not on the Standard track either but that had not
>>>>>>>> prevented IETF from developing tens of standard track RFCs using RFC 1142
>>>>>>>> as the normative reference until RFC 7142 re-classified it as historical. A
>>>>>>>> similar path was followed with IS-IS-TE by publishing RFC 3784 until it was
>>>>>>>> obsoleted by RFC 5305 four years later. I understand that Down Reference,
>>>>>>>> i.e., using informational RFC as the normative reference, is not an unusual
>>>>>>>> situation.
>>>>>>>>
>>>>>>>
>>>>>> [ag] OK.  I'm not an expert on this part so unless someone else that
>>>>>> is an expert (chairs, AD?) can comment on it, I'll just let it go.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Technical
>>>>>>>>
>>>>>>>> Section 1:
>>>>>>>>
>>>>>>>> This part needs to be rewritten:
>>>>>>>> >>>
>>>>>>>> The individual racks may be part of a different Layer 3 network, or
>>>>>>>> they could be in a single Layer 2 network. The VXLAN segments/overlays are
>>>>>>>> overlaid on top of Layer 3 network. A VM can communicate with another VM
>>>>>>>> only if they are on the same VXLAN segment.
>>>>>>>> >>>
>>>>>>>> It's hard to parse and, given IRB,
>>>>>>>>
>>>>>>> GIM>> Would the following text be acceptable:
>>>>>>> OLD TEXT:
>>>>>>>    VXLAN is typically deployed in data centers interconnecting
>>>>>>>    virtualized hosts, which may be spread across multiple racks.  The
>>>>>>>    individual racks may be part of a different Layer 3 network, or
>>>>>>> they
>>>>>>>    could be in a single Layer 2 network.  The VXLAN segments/overlays
>>>>>>>    are overlaid on top of Layer 3 network.
>>>>>>> NEW TEXT:
>>>>>>> VXLAN is typically deployed in data centers interconnecting
>>>>>>> virtualized
>>>>>>> hosts of a tenant. VXLAN addresses requirements of the Layer 2 and
>>>>>>> Layer 3 data center network infrastructure in the presence of VMs in
>>>>>>> a multi-tenant environment, discussed in section 3 [RFC7348], by
>>>>>>>  providing Layer 2 overlay scheme on a Layer 3 network.
>>>>>>>
>>>>>>
>>>>>> [ag] This is a lot better.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>  A VM can communicate with another VM only if they are on the same
>>>>>>> VXLAN segment.
>>>>>>>>
>>>>>>>> the last sentence above is wrong.
>>>>>>>>
>>>>>>> GIM>> Section 4 in RFC 7348 states:
>>>>>>> Only VMs within the same VXLAN segment can communicate with each
>>>>>>> other.
>>>>>>>
>>>>>>
>>>>>> [ag] VMs on different segments can communicate using routing/IRB, so
>>>>>> even RFC 7348 is wrong.  Perhaps the text should be modified so say -- "In
>>>>>> the absence of a router in the overlay, a VM can communicate...".
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Section 3:
>>>>>>>> >>>
>>>>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>>>> may not support L3.
>>>>>>>> >>>
>>>>>>>> Are you suggesting most deployments have VMs with no IP
>>>>>>>> addresses/configuration?
>>>>>>>>
>>>>>>> GIM>> Would re-word as follows:
>>>>>>> OLD TEXT:
>>>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>>>  may not support L3.
>>>>>>> NEW TEXT:
>>>>>>> Deployments may have VMs with only L2 capabilities that do not
>>>>>>> support L3.
>>>>>>>
>>>>>>
>>>>>> [ag] I still don't understand this.  What does it mean for a VM to
>>>>>> not support L3?  No IP address, no default GW, something else?
>>>>>>
>>>>> GIM2>> VM communicates with its VTEP which, in turn, originates VXLAN
>>>>> tunnel. VM is not required to have IP address as it is VTEP's IP address
>>>>> that VM's MAC is associated with. As for gateway, RFC 7348 discusses VXLAN
>>>>> gateway as the device that forwards traffice between VXLAN and non-VXLAN
>>>>> domains. Considering all that, would the following change be acceptable:
>>>>> OLD TEXT:
>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>  may not support L3.
>>>>> NEW TEXT:
>>>>>  Most deployments will have VMs with only L2 capabilities and not have
>>>>> an IP address assigned.
>>>>>
>>>>
>>>> [ag2] Do you have a reference for this (i.e. that most deployments have
>>>> VMs without an IP address)?  Normally I would think VMs would have an IP
>>>> address.  It's just that they are segregated into segments and, without an
>>>> intervening router, they are restricted to communicate only within their
>>>> subnet.
>>>>
>>> GIM3>> Would the following text be acceptable:
>>>
>>> Deployments might have VMs with only L2 capabilities and not have an IP
>>> address assigned or,
>>> in other cases, VMs are assigned IP address but are restricted to
>>> communicate only within their subnet.
>>>
>>>
>> [ag3] Yes, this is better.
>>
>>
>>>>>>
>>>>>>>
>>>>>>>> >>>
>>>>>>>> Having a hierarchical OAM model helps localize faults though it
>>>>>>>> requires additional consideration.
>>>>>>>> >>>
>>>>>>>> What are the additional considerations?
>>>>>>>>
>>>>>>> GIM>> For example, coordination of BFD intervals across the OAM
>>>>>>> layers.
>>>>>>>
>>>>>>
>>>>>> [ag] Can we mention them in the draft?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Would be useful to add a reference to RFC 8293 in case the reader
>>>>>>>> would like to know more about service nodes.
>>>>>>>>
>>>>>>> GIM>> I have to admit that I don't find how RFC 8293  A Framework
>>>>>>> for Multicast in Network Virtualization over Layer 3 is related to this
>>>>>>> document. Please help with additional reference to the text of the
>>>>>>> document.
>>>>>>>
>>>>>>
>>>>>> [ag] The RFC discusses the use of service nodes which is mentioned
>>>>>> here.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Section 4
>>>>>>>> >>>
>>>>>>>> Separate BFD sessions can be established between the VTEPs (IP1 and
>>>>>>>> IP2) for monitoring each of the VXLAN tunnels (VNI 100 and 200).
>>>>>>>> >>>
>>>>>>>> IMO, the document should mention that this could lead to scaling
>>>>>>>> issues given that VTEPs can support well in excess of 4K VNIs.
>>>>>>>> Additionally, we should mention that with IRB, a given VNI may not even
>>>>>>>> exist on the destination VTEP.  Finally, what is the benefit of doing
>>>>>>>> this?  There may be certain corner cases where it's useful (vs a single BFD
>>>>>>>> session between the VTEPs for all VNIs) but it would be good to explain
>>>>>>>> what those are.
>>>>>>>>
>>>>>>> GIM>> Will add text in the Security Considerations section that
>>>>>>> VTEPs should have limit on number of BFD sessions.
>>>>>>>
>>>>>>
>>>>>> [ag] I was hoping for two things:
>>>>>> - A mention about the scalability issue right where per-VNI BFD is
>>>>>> discussed.  (Not sure why that is a security issue/consideration.)
>>>>>>
>>>>> GIM2>> I've added the following sentense in both places:
>>>>> The implementation SHOULD have a reasonable upper bound on the number
>>>>> of BFD sessions that can be created between the same pair of VTEPs.
>>>>>
>>>>
>>>> [ag2] What is the criteria for determining what is reasonable?
>>>>
>>> GIM>> I usually understand that as requirement to make it controllable,
>>> have configurable limit. Thus it will be up to an network operator to set
>>> the limit.
>>>
>>>>
>>>>
>>>>> - What is the benefit of running BFD per VNI between a pair of VTEPs?
>>>>>>
>>>>> GIM2>> An alternative would be to run CFM between VMs, if there's the
>>>>> need to monitor liveliness of the particular VM. Again, this is optional.
>>>>>
>>>>
>>>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one
>>>> to monitor the liveliness of VMs.
>>>>
>>>
>> [ag3] I think you missed responding to this.  I'm not sure of the value
>> of running BFD per VNI between VTEPs.  What am I getting that is not
>> covered by running a single BFD session with VNI 0 between the VTEPs?
>>
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>>> Sections 5.1 and 6.1
>>>>>>>>
>>>>>>>> In 5.1 we have
>>>>>>>> >>>
>>>>>>>> The inner MAC frame carrying the BFD payload has the
>>>>>>>> following format:
>>>>>>>> ... Source IP: IP address of the originating VTEP. Destination IP:
>>>>>>>> IP address of the terminating VTEP.
>>>>>>>> >>>
>>>>>>>>
>>>>>>>> In 6.1 we have
>>>>>>>> >>>
>>>>>>>>
>>>>>>>> Since multiple BFD sessions may be running between two
>>>>>>>> VTEPs, there needs to be a mechanism for demultiplexing received BF
>>>>>>>>
>>>>>>>> packets to the proper session.  The procedure for demultiplexing
>>>>>>>> packets with Your Discriminator equal to 0 is different from[RFC5880 <https://tools.ietf.org/html/rfc5880>].
>>>>>>>>
>>>>>>>> *For such packets, the BFD session MUST be identified*
>>>>>>>>
>>>>>>>> *using the inner headers, i.e., the source IP and the destination IP
>>>>>>>> present in the IP header carried by the payload of the VXLAN*
>>>>>>>>
>>>>>>>> *encapsulated packet.*
>>>>>>>>
>>>>>>>>
>>>>>>>> >>>
>>>>>>>> How does this work if the source IP and dest IP are the same as
>>>>>>>> specified in 5.1?
>>>>>>>>
>>>>>>> GIM>> You're right, Destination and source IP addresses likely are
>>>>>>> the same in this case. Will add that the source UDP port number, along with
>>>>>>> the pair of IP addresses, MUST be used to demux received BFD control
>>>>>>> packets. Would you agree that will be sufficient?
>>>>>>>
>>>>>>
>>>>>> [ag] Yes, I think that should work.
>>>>>>
>>>>>>>
>>>>>>>> Editorial
>>>>>>>>
>>>>>>>
>>>>>> [ag] Agree with all comments on this section.
>>>>>>
>>>>>>>
>>>>>>>> - Terminology section should be renamed to acronyms.
>>>>>>>>
>>>>>>> GIM>> Accepted
>>>>>>>
>>>>>>>> - Document would benefit from a thorough editorial scrub, but maybe
>>>>>>>> that will happen once it gets to the RFC editor.
>>>>>>>>
>>>>>>> GIM>> Will certainly have helpful comments from ADs and RFC editor.
>>>>>>>
>>>>>>>>
>>>>>>>> Section 1
>>>>>>>> >>>
>>>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348
>>>>>>>> <https://tools.ietf.org/html/rfc7348>]. provides an encapsulation
>>>>>>>> scheme that allows virtual machines (VMs) to communicate in a data center
>>>>>>>> network.
>>>>>>>> >>>
>>>>>>>> This is not accurate.  VXLAN allows you to implement an overlay to
>>>>>>>> decouple the address space of the attached hosts from that of the network.
>>>>>>>>
>>>>>>> GIM>> Thank you for the suggested text. Will change as follows:
>>>>>>> OLD TEXT:
>>>>>>>    "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348].
>>>>>>> provides
>>>>>>>    an encapsulation scheme that allows virtual machines (VMs) to
>>>>>>>    communicate in a data center network.
>>>>>>> NEW TEXT:
>>>>>>>  "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348].  provides
>>>>>>>    an encapsulation scheme that allows building an overlay network
>>>>>>> by
>>>>>>>   decoupling the address space of the attached virtual hosts from
>>>>>>> that of the network.
>>>>>>>
>>>>>>>>
>>>>>>>> Section 7
>>>>>>>>
>>>>>>>> VTEP's -> VTEPs
>>>>>>>>
>>>>>>> GIM>> Yes, thank you.
>>>>>>>
>>>>>>

WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky
Re: WGLC comments on draft-ietf-bfd-vxlan Anoop Ghanwani
Re: WGLC comments on draft-ietf-bfd-vxlan Greg Mirsky