Re: [armd] review of draft-ietf-armd-problem-statement-02

Thomas Narten <narten@us.ibm.com> Fri, 25 May 2012 19:44 UTC

Message-Id: <201205251943.q4PJhfTb019425@cichlid.raleigh.ibm.com>
To: anoop@alumni.duke.edu
In-reply-to: <CA+-tSzxY2AdMqcOSDDY3A-o+wJj=Ww5FE4btEe1uPgDMbehANA@mail.gmail.com>
References: <CA+-tSzxY2AdMqcOSDDY3A-o+wJj=Ww5FE4btEe1uPgDMbehANA@mail.gmail.com>
Comments: In-reply-to Anoop Ghanwani <ghanwani@gmail.com> message dated "Thu, 03 May 2012 19:21:54 -0700."
Date: Fri, 25 May 2012 15:43:40 -0400
From: Thomas Narten <narten@us.ibm.com>
Cc: armd@ietf.org
Subject: Re: [armd] review of draft-ietf-armd-problem-statement-02
Precedence: list

H Anoop.

Thanks for your review very detailed review comments. I've adopted
most of them directly. Some questions below.

Anoop Ghanwani <ghanwani@gmail.com> writes:

> Section 4.4.1
> ============

> For consistency with the following 2 sections
> change title from Layer 3 to L3.

> "This topology is ideal for scenarios where servers
> Â  attached to a particular access switch generally run applications
> Â  that are are confined to using a single subnet."
> I'm not sure I agree with this.  There are many issues
> surround this including the capabilities of the devices
> in the network, the use of the multicast, and the preferences
> of the network administrator.

I agree that "is ideal" is too strong. How about if I say instead:

    This topology has benefits in scenarios ...

Would that address your concerns?

> "Even though
>    layer 2 traffic are still partitioned by VLANs, the fact that all
>    VLANs are enabled on all ports can lead to broadcast traffic on all
>    VLANs to traverse all links and ports, which is same effect as one
>    big Layer 2 domain. "
> I disagree with this because all VLANs would only
> need to be provisioned on the aggregation-facing ports.
> The disadvantage here is that a lot more broadcast traffic
> hits the aggregation layer, and when we need to cross
> VLAN boundaries, the traffic must go all way to the
> aggregation switch even though the source and destination
> may be on the same access switch, and the requirement
> for larger ARP tables at the aggregation switches.

I struggled with this for a long while. Is this text any better?:

     <t> When the L3 domain only extends to aggregation switches,
         hosts in any of the IP subnets configured on the aggregation
         switches can be reachable via L2 through any access switches
         if access switches enable all the VLANs.  This topology
         allows a greater level of flexibility as servers attached to
         any access switch can be reloaded with applications that have
         been provisioned with IP addresses from multiple prefixes as
         needed.  Further, in such an environment, VMs can migrate
         between racks without IP address changes.  The drawback of
         this design however is that multiple VLANs have to be enabled
         on all access switches and all access-facing ports on
         aggregation switches. Even though L2 traffic is still
         partitioned by VLANs, the fact that all VLANs are enabled on
         all ports can lead to broadcast traffic on all VLANs to
         traverse all links and ports, which is same effect as one big
         L2 domain on the access-facing side of the aggregation
         switch.  In addition, internal traffic itself might have to
         cross different L2 boundaries resulting in significant ARP/ND
         load at the aggregation switches.  This design provides a
         good tradeoff between flexibility and L2 domain size.  A
         moderate sized data center might utilize this approach to
         provide high availability services at a single location.
         </t>

> "However, the
>    Overlay Edge switches/routers which perform the network address
>    encapsulation/decapsulation must ultimately perform a L2 address
>    resolution and could still potentially face scaling issues at that
>    point."
> It's not the overlay edge switches that have the scaling
> problem, its the volume of broadcasts that need to be
> sent across the core and that is not helped simply by
> using an L3 overlay.

I also struggled quite a bit with this comment. Is the following an
improvement?:

    <t> A potential problem that arises in a large data center is when
        a large number of hosts communicate with their peers in
        different subnets, all these hosts send (and receive) data
        packets to their respective L2/L3 boundary nodes as the
        traffic flows are generally bi-directional.  This has the
        potential to further highlight any scaling problems.  These
        L2/L3 boundary nodes have to process ARP/ND requests sent from
        originating subnets and resolve physical (MAC) addresses in
        the target subnets for what are generally bi-directional
        flows.  Therefore, for maximum flexibility in managing the
        data center workload, it is often desirable to use overlays to
        place related groups of hosts in the same topological subnet
        to avoid the L2/L3 boundary translation.  The use of overlays
        in the data center network can be a useful design mechanism to
        help manage a potential bottleneck at the L2 / L3 boundary by
        redefining where that boundary exists.  </t>


> Section 6
> ==========

> "Thus, whereas all
>    nodes must process every ARP query, ND queries are processed only by
>    the nodes to which they are intended."
> When virtualization is in use, the NIC is often operated
> in promiscuous mode, which means that the packet would
> be delivered to the hypervisor/vswitch and the filtering
> would have to be done there (usually implemented in software),
> making the problem almost as bad as with ARP.

Revised text:

    Thus, whereas all nodes must process every ARP query, ND queries
    are processed only by the nodes to which they are intended. In
    cases where multicast filtering can't effectively be implemented
    in the NIC (e.g., as on hypervisors supporting virualization),
    filtering would need to be done in software (e.g., in the
    hypervisor's vSwitch).

Thomas

[armd] review of Anoop Ghanwani
Re: [armd] review of draft-ietf-armd-problem-stat… Thomas Narten
Re: [armd] review of draft-ietf-armd-problem-stat… Anoop Ghanwani