Re: [armd] review of draft-ietf-armd-problem-statement-02

Thomas Narten <narten@us.ibm.com> Fri, 25 May 2012 19:44 UTC

Return-Path: <narten@us.ibm.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A98F021F87A1 for <armd@ietfa.amsl.com>; Fri, 25 May 2012 12:44:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.599
X-Spam-Level:
X-Spam-Status: No, score=-110.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uJ-bmOQm+hZR for <armd@ietfa.amsl.com>; Fri, 25 May 2012 12:44:21 -0700 (PDT)
Received: from e7.ny.us.ibm.com (e7.ny.us.ibm.com [32.97.182.137]) by ietfa.amsl.com (Postfix) with ESMTP id 44D0421F87B6 for <armd@ietf.org>; Fri, 25 May 2012 12:44:19 -0700 (PDT)
Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <armd@ietf.org> from <narten@us.ibm.com>; Fri, 25 May 2012 15:44:18 -0400
Received: from d01dlp01.pok.ibm.com (9.56.224.56) by e7.ny.us.ibm.com (192.168.1.107) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Fri, 25 May 2012 15:43:45 -0400
Received: from d01relay07.pok.ibm.com (d01relay07.pok.ibm.com [9.56.227.147]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id D001838C8059 for <armd@ietf.org>; Fri, 25 May 2012 15:43:43 -0400 (EDT)
Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay07.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q4PJhiLr24510632 for <armd@ietf.org>; Fri, 25 May 2012 15:43:44 -0400
Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q4PJhhMX022251 for <armd@ietf.org>; Fri, 25 May 2012 16:43:44 -0300
Received: from cichlid.raleigh.ibm.com ([9.80.11.36]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q4PJhgGX022207 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 25 May 2012 16:43:43 -0300
Received: from cichlid.raleigh.ibm.com (localhost [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.5/8.12.5) with ESMTP id q4PJhfTb019425; Fri, 25 May 2012 15:43:41 -0400
Message-Id: <201205251943.q4PJhfTb019425@cichlid.raleigh.ibm.com>
To: anoop@alumni.duke.edu
In-reply-to: <CA+-tSzxY2AdMqcOSDDY3A-o+wJj=Ww5FE4btEe1uPgDMbehANA@mail.gmail.com>
References: <CA+-tSzxY2AdMqcOSDDY3A-o+wJj=Ww5FE4btEe1uPgDMbehANA@mail.gmail.com>
Comments: In-reply-to Anoop Ghanwani <ghanwani@gmail.com> message dated "Thu, 03 May 2012 19:21:54 -0700."
Date: Fri, 25 May 2012 15:43:40 -0400
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12052519-5806-0000-0000-00001591D53A
Cc: armd@ietf.org
Subject: Re: [armd] review of draft-ietf-armd-problem-statement-02
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 25 May 2012 19:44:21 -0000

H Anoop.

Thanks for your review very detailed review comments. I've adopted
most of them directly. Some questions below.

Anoop Ghanwani <ghanwani@gmail.com> writes:

> Section 4.4.1
> ============

> For consistency with the following 2 sections
> change title from Layer 3 to L3.

> "This topology is ideal for scenarios where servers
>   attached to a particular access switch generally run applications
>   that are are confined to using a single subnet."
> I'm not sure I agree with this.  There are many issues
> surround this including the capabilities of the devices
> in the network, the use of the multicast, and the preferences
> of the network administrator.

I agree that "is ideal" is too strong. How about if I say instead:

    This topology has benefits in scenarios ...

Would that address your concerns?

> "Even though
>    layer 2 traffic are still partitioned by VLANs, the fact that all
>    VLANs are enabled on all ports can lead to broadcast traffic on all
>    VLANs to traverse all links and ports, which is same effect as one
>    big Layer 2 domain. "
> I disagree with this because all VLANs would only
> need to be provisioned on the aggregation-facing ports.
> The disadvantage here is that a lot more broadcast traffic
> hits the aggregation layer, and when we need to cross
> VLAN boundaries, the traffic must go all way to the
> aggregation switch even though the source and destination
> may be on the same access switch, and the requirement
> for larger ARP tables at the aggregation switches.

I struggled with this for a long while. Is this text any better?:

     <t> When the L3 domain only extends to aggregation switches,
         hosts in any of the IP subnets configured on the aggregation
         switches can be reachable via L2 through any access switches
         if access switches enable all the VLANs.  This topology
         allows a greater level of flexibility as servers attached to
         any access switch can be reloaded with applications that have
         been provisioned with IP addresses from multiple prefixes as
         needed.  Further, in such an environment, VMs can migrate
         between racks without IP address changes.  The drawback of
         this design however is that multiple VLANs have to be enabled
         on all access switches and all access-facing ports on
         aggregation switches. Even though L2 traffic is still
         partitioned by VLANs, the fact that all VLANs are enabled on
         all ports can lead to broadcast traffic on all VLANs to
         traverse all links and ports, which is same effect as one big
         L2 domain on the access-facing side of the aggregation
         switch.  In addition, internal traffic itself might have to
         cross different L2 boundaries resulting in significant ARP/ND
         load at the aggregation switches.  This design provides a
         good tradeoff between flexibility and L2 domain size.  A
         moderate sized data center might utilize this approach to
         provide high availability services at a single location.
         </t>

> "However, the
>    Overlay Edge switches/routers which perform the network address
>    encapsulation/decapsulation must ultimately perform a L2 address
>    resolution and could still potentially face scaling issues at that
>    point."
> It's not the overlay edge switches that have the scaling
> problem, its the volume of broadcasts that need to be
> sent across the core and that is not helped simply by
> using an L3 overlay.

I also struggled quite a bit with this comment. Is the following an
improvement?:

    <t> A potential problem that arises in a large data center is when
        a large number of hosts communicate with their peers in
        different subnets, all these hosts send (and receive) data
        packets to their respective L2/L3 boundary nodes as the
        traffic flows are generally bi-directional.  This has the
        potential to further highlight any scaling problems.  These
        L2/L3 boundary nodes have to process ARP/ND requests sent from
        originating subnets and resolve physical (MAC) addresses in
        the target subnets for what are generally bi-directional
        flows.  Therefore, for maximum flexibility in managing the
        data center workload, it is often desirable to use overlays to
        place related groups of hosts in the same topological subnet
        to avoid the L2/L3 boundary translation.  The use of overlays
        in the data center network can be a useful design mechanism to
        help manage a potential bottleneck at the L2 / L3 boundary by
        redefining where that boundary exists.  </t>


> Section 6
> ==========

> "Thus, whereas all
>    nodes must process every ARP query, ND queries are processed only by
>    the nodes to which they are intended."
> When virtualization is in use, the NIC is often operated
> in promiscuous mode, which means that the packet would
> be delivered to the hypervisor/vswitch and the filtering
> would have to be done there (usually implemented in software),
> making the problem almost as bad as with ARP.

Revised text:

    Thus, whereas all nodes must process every ARP query, ND queries
    are processed only by the nodes to which they are intended. In
    cases where multicast filtering can't effectively be implemented
    in the NIC (e.g., as on hypervisors supporting virualization),
    filtering would need to be done in software (e.g., in the
    hypervisor's vSwitch).

Thomas