Re: [armd] Gen-art] review: draft-ietf-armd-problem-statement-03

Thomas Narten <narten@us.ibm.com> Wed, 29 August 2012 16:01 UTC

Return-Path: <narten@us.ibm.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C10C321F8668 for <armd@ietfa.amsl.com>; Wed, 29 Aug 2012 09:01:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.599
X-Spam-Level:
X-Spam-Status: No, score=-110.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gTwYqqKi0lkV for <armd@ietfa.amsl.com>; Wed, 29 Aug 2012 09:01:56 -0700 (PDT)
Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by ietfa.amsl.com (Postfix) with ESMTP id 8FD8C21F868A for <armd@ietf.org>; Wed, 29 Aug 2012 09:01:56 -0700 (PDT)
Received: from /spool/local by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <armd@ietf.org> from <narten@us.ibm.com>; Wed, 29 Aug 2012 10:01:44 -0600
Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 29 Aug 2012 10:01:40 -0600
Received: from d03relay05.boulder.ibm.com (d03relay05.boulder.ibm.com [9.17.195.107]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id ADCDA1FF0055; Wed, 29 Aug 2012 10:01:33 -0600 (MDT)
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay05.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q7TG0v1d078682; Wed, 29 Aug 2012 10:01:01 -0600
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q7TG0rhA032277; Wed, 29 Aug 2012 10:00:53 -0600
Received: from cichlid.raleigh.ibm.com ([9.80.31.201]) by d03av04.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q7TG0nO6031741 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Aug 2012 10:00:51 -0600
Received: from cichlid.raleigh.ibm.com (localhost.localdomain [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.5/8.12.5) with ESMTP id q7TFxgmJ012331; Wed, 29 Aug 2012 11:59:43 -0400
Message-Id: <201208291559.q7TFxgmJ012331@cichlid.raleigh.ibm.com>
To: "Joel M. Halpern" <jmh@joelhalpern.com>
In-reply-to: <502471DB.80303@joelhalpern.com>
References: <50243C05.3080006@nostrum.com> <502471DB.80303@joelhalpern.com>
Comments: In-reply-to "Joel M. Halpern" <jmh@joelhalpern.com> message dated "Thu, 09 Aug 2012 22:28:43 -0400."
Date: Wed, 29 Aug 2012 11:59:41 -0400
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12082916-7282-0000-0000-00000C6AFA4E
Cc: gen-art@ietf.org, "A. Jean Mahoney" <mahoney@nostrum.com>, "armd@ietf.org" <armd@ietf.org>
Subject: Re: [armd] Gen-art] review: draft-ietf-armd-problem-statement-03
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Aug 2012 16:01:58 -0000

Hi Joel.

Thanks for the review comments. (And sorry for taking so long to respond!)

"Joel M. Halpern" <jmh@joelhalpern.com>; writes:

> Major issues:
>      The use of the term "switch" seems confusing.  I had first assumed 
> that it meant an ethernet switch (which might have  abit of L3 smarts, 
> or might not.  I was trying not to be picky.)  But then, in section 6.3 
> it refers to "core switches ... are the data center gateways to external 
> networks" which means that those are routers.

The switch vs. router terminology is tricky.

6.3 says:

   Core switches connect multiple aggregation switches and are the data
   center gateway(s) to external networks or interconnect to different
   sets of racks within one data center.

How about I change that to:

   Core switches connect multiple aggregation switches and interface
   with data center gateway(s) to external networks or interconnect to
   different sets of racks within one data center.

I know that is just side stepping this a bit, but Section 6.4 has more
text about the L2/L3 boundaries in various deployments. This document
is walking a bit of tightrope by trying to be general and not too
specific. If we get too specific, folk start screaming "that's not the
way my data center looks".

> Moderate Issue:
>     The document seems to be interestingly selective in what modern 
> technologies it chooses to mention.  Mostly it seems to be describing 
> problems with data center networks using technology more than 5 years 
> old.  Since that is the widely deployed practice, that is
>     defensible.

I think this has to do with how the WG was chartered.

> But then the document chooses to mention new work such as OpenFlow, 
> without mentioning the work IEEE has done on broadcast ad multicast 
> containment for data centers.  It seems to me that we need to be 
> consistent, either describing only the widely deployed technology, or 
> including a fair mention of already defined and productized solutions 
> that are not yet widely deployed.

I'd be fine with taking out the references to OpenFlow. I don't think
it adds much to the document.

>      On a related note, the document assumes that multicast NDs are 
> delivered to all nodes, while in practice I believe existing techniques 
> to filter such multicast messages closer to the source are widely 
> deployed.  (Section 5.)

This paragraph has been signficantly revised. The current proposed
text  is:

	Broadly speaking, from the perspective of address resolution,
        IPv6's Neighbor Discovery (ND) behaves much like ARP, with a
        few notable differences. First, ARP uses broadcast, whereas ND
        uses multicast. Specifically, when querying for a target IP
        address, ND maps the target address into an IPv6 Solicited
        Node multicast address. Using multicast rather than broadcast
        has the benefit that the multicast frames do not necessarily
        need to be sent to all parts of the network, i.e., only to
        segments where listeners for the Solicited Node multicast
        address reside. In the case where multicast frames are
        delivered to all parts of the network, sending to a multicast
        still has the advantage that most (if not all) nodes will
        filter out the (unwanted) multicast query via filters
        installed in the NIC rather than burdening host software with
        the need to process such packets. Thus, whereas all nodes must
        process every ARP query, ND queries are processed only by the
        nodes to which they are intended. In cases where multicast
        filtering can't effectively be implemented in the NIC (e.g.,
        as on hypervisors supporting virtualization), filtering would
        need to be done in software (e.g., in the hypervisor's
        vSwitch).

Is that better?	

> Minor issues:
>      I presume that section 6.4.2 which describes needing to enable all 
> VLANs on all aggregation ports is a description of current practice, 
> since it is not a requirement of current technologies, either via VLAN 
> management or orchestration?

Yes.

>      Section 6.4.4 seems very odd.  The title is "overlays".  Are there 
> widely deployed overlays?

I keep hearing yes, but proprietary, so little can be said about them.

> If so, it would be good to name the 
> technologies being referred to here.  If this is intended to refer to 
> the overlay proposal in IETF and IEEE, I think that the characterization 
> is somewhat misleading, and probably is best simply removed.

Hmm, I didn't actually write this text. It originally came from
draft-karir-armd-datacenter-reference-arch, which was merged into the
problem statement document by the WG.

I agree this section is kind of fuzzy,

I'm on the fence about what to do. Are there other opinions?

>      Is the fifth paragraph of section 71. on ARP processing and 
> buffering in the absence of ARP cache entries accurate?  I may well be 
> out of date, but it used to be the case that most routers dropped the 
> packets, and some would buffer 1 packet deep at most.  This description 
> indicates a rather more elaborate behavior.

RFC 1122 says:

         2.3.2.2  ARP Packet Queue

            The link layer SHOULD save (rather than discard) at least
            one (the latest) packet of each set of packets destined to
            the same unresolved IP address, and transmit the saved
            packet when the address has been resolved.

RFC 1812 says:

3.3.2 Address Resolution Protocol - ARP

   Routers that implement ARP MUST be compliant and SHOULD be
   unconditionally compliant with the requirements in [INTRO:2].

   The link layer MUST NOT report a Destination Unreachable error to IP
   solely because there is no ARP cache entry for a destination; it
   SHOULD queue up to a small number of datagrams breifly while
   performing the ARP request/reply sequence, and reply that the
   destination is unreachable to one of the queued datagrams only when
   this proves fruitless.


>      Given that this document says it is a general document about 
> scaling issues for data centers, I am surprised that the security 
> considerations section does not touch on the increased complexity of 
> segregating subscriber traffic (customer A can not talk to customer B) 
> when there are very large numbers of customers, and the itneraction of 
> this with L2 scope.

The ARMD WG struggled a bit about scope, and all it was chartered to
do was a problem statement related to address resolution.

Looking at the title of the document "Problem Statement for ARMD", I'd
argue that's not helpful for an RFC given that ARMD will close  and
there is no followup WG planned. How about I change the title to
something like:

    Address Resolution Problems in Large Data Center Networks

I don't want to add other issues like traffic segregation to the
document at this point. Amoung other things, the WG really doesn't
have the energy for this... The intro is pretty clear (IMO) about the
limited scope of the document.

Thomas