Re: WGLC comments on draft-ietf-bfd-vxlan

Anoop Ghanwani <anoop@alumni.duke.edu> Tue, 20 November 2018 20:15 UTC

Return-Path: <ghanwani@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 50193130EAF; Tue, 20 Nov 2018 12:15:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.418
X-Spam-Level:
X-Spam-Status: No, score=-0.418 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_FONT_FACE_BAD=0.981, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M2bLN9tXBvWQ; Tue, 20 Nov 2018 12:14:58 -0800 (PST)
Received: from mail-vk1-f180.google.com (mail-vk1-f180.google.com [209.85.221.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id F30BD130E91; Tue, 20 Nov 2018 12:14:57 -0800 (PST)
Received: by mail-vk1-f180.google.com with SMTP id h128so700447vkg.11; Tue, 20 Nov 2018 12:14:57 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ut1UHcmQN6pghw6RU8gfOlmlavXzajUQajbeIcZRQns=; b=DXUG3RJMMj7P+CGp4bq/F1QGoNSsSF4Oy74yNmh9h8b8XgXmnprH9+wyrCs/TuGhST Gw0+b38WGhPwnXZP7P6QM/+RcrC6vsfauzdJa6UuYtypesLJBMeEZOlKExnDsAculvtJ 1A2xpYC2+8iQlRZQG0gvbmJTwKYUa00ePOeDDfbwxbgIC45aaAp+VLQAPf2NqgRNkWGt VLhr3u0EzviB6/LeHOppwKlhOjz5CdBzNXCUu3l4Kyz/3nmA1bPzVkQSlGrjClw/mtJd J+FRNoTVu+nHiZys7Mde6mtm/fDZvXDunH8l7RTfbcLUNgFERaMBUBBdE9o0Hu+WsO90 eW0g==
X-Gm-Message-State: AA+aEWZ0wpgrau7yvkXkXwXjkJ+Ik2uhAxE5wtU4qv7xGu4YUj6gJXJg 9K6DrYBLLXvoUZLnDpL3qeLeZrWkRJYXcwdoTGY=
X-Google-Smtp-Source: AFSGD/WZs89uIttYBBiN30GhomeGt0D5gX8Vi80Y+HTwW9/EJx1cJPTTY8qufpl25+r23RoAKf5xFj4lR8kyiGl2BMI=
X-Received: by 2002:a1f:f0d:: with SMTP id 13mr1489103vkp.21.1542744896715; Tue, 20 Nov 2018 12:14:56 -0800 (PST)
MIME-Version: 1.0
References: <CA+-tSzxFxtVo6NbfSw4wzb--fSuN4zsSvX7R58iiYFgVF5cA6Q@mail.gmail.com> <CA+RyBmVXeCYAZhWTy-g6U_EJ7NOFQwV4twJaJ-7_LT5_wKFGFw@mail.gmail.com> <CA+-tSzxQp2x0hpAF253b9yKL1aD1J1CaGHs7T6VE8zuvg25R_Q@mail.gmail.com> <CA+RyBmXoOKS-Nq7bDfsgDZXou5-FcprEQeVkhWhAD4_1MoHqUQ@mail.gmail.com> <CA+-tSzzgKyfXzE+=eVLz7B3u1X_HFahQ6GCFTbL+-rfjsR03uA@mail.gmail.com> <CA+RyBmVeyOhBNANTfG87VbNkwh5HqxZnFc7AzFcCLo_6UcHSMQ@mail.gmail.com>
In-Reply-To: <CA+RyBmVeyOhBNANTfG87VbNkwh5HqxZnFc7AzFcCLo_6UcHSMQ@mail.gmail.com>
From: Anoop Ghanwani <anoop@alumni.duke.edu>
Date: Tue, 20 Nov 2018 12:14:42 -0800
Message-ID: <CA+-tSzyCKsQx9zTMjjTpwjF=tL2WOz7hNUff_KFQwL8n2Y+xUg@mail.gmail.com>
Subject: Re: WGLC comments on draft-ietf-bfd-vxlan
To: Greg Mirsky <gregimirsky@gmail.com>
Cc: rtg-bfd@ietf.org, nvo3@ietf.org
Content-Type: multipart/alternative; boundary="0000000000008c42b6057b1e4944"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/wykUnzuqtZO404587FCxlmZRk4I>
X-Mailman-Approved-At: Tue, 20 Nov 2018 12:20:26 -0800
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Nov 2018 20:15:06 -0000

Hi Greg,

Please see inline prefixed by [ag3].

Thanks,
Anoop

On Fri, Nov 16, 2018 at 5:29 PM Greg Mirsky <gregimirsky@gmail.com> wrote:

> Hi Anoop,
> thank you for the discussion. Please find my responses tagged GIM3>>.
> Also, attached diff and the updated working version of the draft. Hope
> we're converging.
>
> Regards,
> Greg
>
> On Wed, Nov 14, 2018 at 11:00 PM Anoop Ghanwani <anoop@alumni.duke.edu>
> wrote:
>
>> Hi Greg,
>>
>> Please see inline prefixed with [ag2].
>>
>> Thanks,
>> Anoop
>>
>> On Wed, Nov 14, 2018 at 9:45 AM Greg Mirsky <gregimirsky@gmail.com>
>> wrote:
>>
>>> Hi Anoop,
>>> thank you for the expedient response. I am glad that some of my
>>> responses have addressed your concerns. Please find followup notes in-line
>>> tagged GIM2>>. I've attached the diff to highlight the updates applied in
>>> the working version. Let me know if these are acceptable changes.
>>>
>>> Regards,
>>> Greg
>>>
>>> On Tue, Nov 13, 2018 at 12:30 PM Anoop Ghanwani <anoop@alumni.duke.edu>
>>> wrote:
>>>
>>>> Hi Greg,
>>>>
>>>> Please see inline prefixed with [ag].
>>>>
>>>> Thanks,
>>>> Anoop
>>>>
>>>> On Tue, Nov 13, 2018 at 11:34 AM Greg Mirsky <gregimirsky@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Anoop,
>>>>> many thanks for the thorough review and detailed comments. Please find
>>>>> my answers, this time for real, in-line tagged GIM>>.
>>>>>
>>>>> Regards,
>>>>> Greg
>>>>>
>>>>> On Thu, Nov 8, 2018 at 1:58 AM Anoop Ghanwani <anoop@alumni.duke.edu>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Here are my comments.
>>>>>>
>>>>>> Thanks,
>>>>>> Anoop
>>>>>>
>>>>>> ==
>>>>>>
>>>>>> Philosophical
>>>>>>
>>>>>> Since VXLAN is not an IETF standard, should we be defining a standard
>>>>>> for running BFD on it?  Should we define BFD over Geneve instead which is
>>>>>> the official WG selection?  Is that going to be a separate document?
>>>>>> GIM>> IS-IS is not on the Standard track either but that had not
>>>>>> prevented IETF from developing tens of standard track RFCs using RFC 1142
>>>>>> as the normative reference until RFC 7142 re-classified it as historical. A
>>>>>> similar path was followed with IS-IS-TE by publishing RFC 3784 until it was
>>>>>> obsoleted by RFC 5305 four years later. I understand that Down Reference,
>>>>>> i.e., using informational RFC as the normative reference, is not an unusual
>>>>>> situation.
>>>>>>
>>>>>
>>>> [ag] OK.  I'm not an expert on this part so unless someone else that is
>>>> an expert (chairs, AD?) can comment on it, I'll just let it go.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Technical
>>>>>>
>>>>>> Section 1:
>>>>>>
>>>>>> This part needs to be rewritten:
>>>>>> >>>
>>>>>> The individual racks may be part of a different Layer 3 network, or
>>>>>> they could be in a single Layer 2 network. The VXLAN segments/overlays are
>>>>>> overlaid on top of Layer 3 network. A VM can communicate with another VM
>>>>>> only if they are on the same VXLAN segment.
>>>>>> >>>
>>>>>> It's hard to parse and, given IRB,
>>>>>>
>>>>> GIM>> Would the following text be acceptable:
>>>>> OLD TEXT:
>>>>>    VXLAN is typically deployed in data centers interconnecting
>>>>>    virtualized hosts, which may be spread across multiple racks.  The
>>>>>    individual racks may be part of a different Layer 3 network, or they
>>>>>    could be in a single Layer 2 network.  The VXLAN segments/overlays
>>>>>    are overlaid on top of Layer 3 network.
>>>>> NEW TEXT:
>>>>> VXLAN is typically deployed in data centers interconnecting
>>>>> virtualized
>>>>> hosts of a tenant. VXLAN addresses requirements of the Layer 2 and
>>>>> Layer 3 data center network infrastructure in the presence of VMs in
>>>>> a multi-tenant environment, discussed in section 3 [RFC7348], by
>>>>>  providing Layer 2 overlay scheme on a Layer 3 network.
>>>>>
>>>>
>>>> [ag] This is a lot better.
>>>>
>>>>
>>>>>
>>>>>  A VM can communicate with another VM only if they are on the same
>>>>> VXLAN segment.
>>>>>>
>>>>>> the last sentence above is wrong.
>>>>>>
>>>>> GIM>> Section 4 in RFC 7348 states:
>>>>> Only VMs within the same VXLAN segment can communicate with each other.
>>>>>
>>>>
>>>> [ag] VMs on different segments can communicate using routing/IRB, so
>>>> even RFC 7348 is wrong.  Perhaps the text should be modified so say -- "In
>>>> the absence of a router in the overlay, a VM can communicate...".
>>>>
>>>>
>>>>>
>>>>> Section 3:
>>>>>> >>>
>>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>> may not support L3.
>>>>>> >>>
>>>>>> Are you suggesting most deployments have VMs with no IP
>>>>>> addresses/configuration?
>>>>>>
>>>>> GIM>> Would re-word as follows:
>>>>> OLD TEXT:
>>>>>  Most deployments will have VMs with only L2 capabilities that
>>>>>  may not support L3.
>>>>> NEW TEXT:
>>>>> Deployments may have VMs with only L2 capabilities that do not support
>>>>> L3.
>>>>>
>>>>
>>>> [ag] I still don't understand this.  What does it mean for a VM to not
>>>> support L3?  No IP address, no default GW, something else?
>>>>
>>> GIM2>> VM communicates with its VTEP which, in turn, originates VXLAN
>>> tunnel. VM is not required to have IP address as it is VTEP's IP address
>>> that VM's MAC is associated with. As for gateway, RFC 7348 discusses VXLAN
>>> gateway as the device that forwards traffice between VXLAN and non-VXLAN
>>> domains. Considering all that, would the following change be acceptable:
>>> OLD TEXT:
>>>  Most deployments will have VMs with only L2 capabilities that
>>>  may not support L3.
>>> NEW TEXT:
>>>  Most deployments will have VMs with only L2 capabilities and not have
>>> an IP address assigned.
>>>
>>
>> [ag2] Do you have a reference for this (i.e. that most deployments have
>> VMs without an IP address)?  Normally I would think VMs would have an IP
>> address.  It's just that they are segregated into segments and, without an
>> intervening router, they are restricted to communicate only within their
>> subnet.
>>
> GIM3>> Would the following text be acceptable:
>
> Deployments might have VMs with only L2 capabilities and not have an IP
> address assigned or,
> in other cases, VMs are assigned IP address but are restricted to
> communicate only within their subnet.
>
>
[ag3] Yes, this is better.


>>>>
>>>>>
>>>>>> >>>
>>>>>> Having a hierarchical OAM model helps localize faults though it
>>>>>> requires additional consideration.
>>>>>> >>>
>>>>>> What are the additional considerations?
>>>>>>
>>>>> GIM>> For example, coordination of BFD intervals across the OAM
>>>>> layers.
>>>>>
>>>>
>>>> [ag] Can we mention them in the draft?
>>>>
>>>>
>>>>>
>>>>>> Would be useful to add a reference to RFC 8293 in case the reader
>>>>>> would like to know more about service nodes.
>>>>>>
>>>>> GIM>> I have to admit that I don't find how RFC 8293  A Framework for
>>>>> Multicast in Network Virtualization over Layer 3 is related to this
>>>>> document. Please help with additional reference to the text of the
>>>>> document.
>>>>>
>>>>
>>>> [ag] The RFC discusses the use of service nodes which is mentioned
>>>> here.
>>>>
>>>>
>>>>>
>>>>>> Section 4
>>>>>> >>>
>>>>>> Separate BFD sessions can be established between the VTEPs (IP1 and
>>>>>> IP2) for monitoring each of the VXLAN tunnels (VNI 100 and 200).
>>>>>> >>>
>>>>>> IMO, the document should mention that this could lead to scaling
>>>>>> issues given that VTEPs can support well in excess of 4K VNIs.
>>>>>> Additionally, we should mention that with IRB, a given VNI may not even
>>>>>> exist on the destination VTEP.  Finally, what is the benefit of doing
>>>>>> this?  There may be certain corner cases where it's useful (vs a single BFD
>>>>>> session between the VTEPs for all VNIs) but it would be good to explain
>>>>>> what those are.
>>>>>>
>>>>> GIM>> Will add text in the Security Considerations section that VTEPs
>>>>> should have limit on number of BFD sessions.
>>>>>
>>>>
>>>> [ag] I was hoping for two things:
>>>> - A mention about the scalability issue right where per-VNI BFD is
>>>> discussed.  (Not sure why that is a security issue/consideration.)
>>>>
>>> GIM2>> I've added the following sentense in both places:
>>> The implementation SHOULD have a reasonable upper bound on the number of
>>> BFD sessions that can be created between the same pair of VTEPs.
>>>
>>
>> [ag2] What is the criteria for determining what is reasonable?
>>
> GIM>> I usually understand that as requirement to make it controllable,
> have configurable limit. Thus it will be up to an network operator to set
> the limit.
>
>>
>>
>>> - What is the benefit of running BFD per VNI between a pair of VTEPs?
>>>>
>>> GIM2>> An alternative would be to run CFM between VMs, if there's the
>>> need to monitor liveliness of the particular VM. Again, this is optional.
>>>
>>
>> [ag2] I'm not sure how running per-VNI BFD between the VTEPs allows one
>> to monitor the liveliness of VMs.
>>
>
[ag3] I think you missed responding to this.  I'm not sure of the value of
running BFD per VNI between VTEPs.  What am I getting that is not covered
by running a single BFD session with VNI 0 between the VTEPs?


>
>>
>>>
>>>>
>>>>>
>>>>>> Sections 5.1 and 6.1
>>>>>>
>>>>>> In 5.1 we have
>>>>>> >>>
>>>>>> The inner MAC frame carrying the BFD payload has the
>>>>>> following format:
>>>>>> ... Source IP: IP address of the originating VTEP. Destination IP: IP
>>>>>> address of the terminating VTEP.
>>>>>> >>>
>>>>>>
>>>>>> In 6.1 we have
>>>>>> >>>
>>>>>>
>>>>>> Since multiple BFD sessions may be running between two
>>>>>> VTEPs, there needs to be a mechanism for demultiplexing received BF
>>>>>>
>>>>>> packets to the proper session.  The procedure for demultiplexing
>>>>>> packets with Your Discriminator equal to 0 is different from[RFC5880 <https://tools.ietf.org/html/rfc5880>].
>>>>>>
>>>>>> *For such packets, the BFD session MUST be identified*
>>>>>>
>>>>>> *using the inner headers, i.e., the source IP and the destination IP
>>>>>> present in the IP header carried by the payload of the VXLAN*
>>>>>>
>>>>>> *encapsulated packet.*
>>>>>>
>>>>>>
>>>>>> >>>
>>>>>> How does this work if the source IP and dest IP are the same as
>>>>>> specified in 5.1?
>>>>>>
>>>>> GIM>> You're right, Destination and source IP addresses likely are the
>>>>> same in this case. Will add that the source UDP port number, along with the
>>>>> pair of IP addresses, MUST be used to demux received BFD control packets.
>>>>> Would you agree that will be sufficient?
>>>>>
>>>>
>>>> [ag] Yes, I think that should work.
>>>>
>>>>>
>>>>>> Editorial
>>>>>>
>>>>>
>>>> [ag] Agree with all comments on this section.
>>>>
>>>>>
>>>>>> - Terminology section should be renamed to acronyms.
>>>>>>
>>>>> GIM>> Accepted
>>>>>
>>>>>> - Document would benefit from a thorough editorial scrub, but maybe
>>>>>> that will happen once it gets to the RFC editor.
>>>>>>
>>>>> GIM>> Will certainly have helpful comments from ADs and RFC editor.
>>>>>
>>>>>>
>>>>>> Section 1
>>>>>> >>>
>>>>>> "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348
>>>>>> <https://tools.ietf.org/html/rfc7348>]. provides an encapsulation
>>>>>> scheme that allows virtual machines (VMs) to communicate in a data center
>>>>>> network.
>>>>>> >>>
>>>>>> This is not accurate.  VXLAN allows you to implement an overlay to
>>>>>> decouple the address space of the attached hosts from that of the network.
>>>>>>
>>>>> GIM>> Thank you for the suggested text. Will change as follows:
>>>>> OLD TEXT:
>>>>>    "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348].  provides
>>>>>    an encapsulation scheme that allows virtual machines (VMs) to
>>>>>    communicate in a data center network.
>>>>> NEW TEXT:
>>>>>  "Virtual eXtensible Local Area Network" (VXLAN) [RFC7348].  provides
>>>>>    an encapsulation scheme that allows building an overlay network by
>>>>>   decoupling the address space of the attached virtual hosts from that
>>>>> of the network.
>>>>>
>>>>>>
>>>>>> Section 7
>>>>>>
>>>>>> VTEP's -> VTEPs
>>>>>>
>>>>> GIM>> Yes, thank you.
>>>>>
>>>>