Re: [armd] Ralph Droms' No Objection on draft-ietf-armd-problem-statement-03: (with COMMENT)
Thomas Narten <narten@us.ibm.com> Thu, 30 August 2012 00:42 UTC
Return-Path: <narten@us.ibm.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 75BD311E80FF for <armd@ietfa.amsl.com>; Wed, 29 Aug 2012 17:42:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -108.487
X-Spam-Level:
X-Spam-Status: No, score=-108.487 tagged_above=-999 required=5 tests=[AWL=-1.787, BAYES_00=-2.599, FB_CIALIS_LEO3=3.899, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tdjzTdvu4cvA for <armd@ietfa.amsl.com>; Wed, 29 Aug 2012 17:42:30 -0700 (PDT)
Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.144]) by ietfa.amsl.com (Postfix) with ESMTP id 497B411E8102 for <armd@ietf.org>; Wed, 29 Aug 2012 17:42:29 -0700 (PDT)
Received: from /spool/local by e4.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <armd@ietf.org> from <narten@us.ibm.com>; Wed, 29 Aug 2012 20:42:29 -0400
Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e4.ny.us.ibm.com (192.168.1.104) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 29 Aug 2012 20:42:27 -0400
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 75318C9003E; Wed, 29 Aug 2012 20:42:26 -0400 (EDT)
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q7U0gPIv192410; Wed, 29 Aug 2012 20:42:26 -0400
Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q7U0gPlW008083; Wed, 29 Aug 2012 18:42:25 -0600
Received: from cichlid.raleigh.ibm.com ([9.80.31.201]) by d03av03.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q7U0gN7I008052 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Aug 2012 18:42:24 -0600
Received: from cichlid.raleigh.ibm.com (localhost.localdomain [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.5/8.12.5) with ESMTP id q7U0gNJJ018727; Wed, 29 Aug 2012 20:42:23 -0400
Message-Id: <201208300042.q7U0gNJJ018727@cichlid.raleigh.ibm.com>
To: Ralph Droms <rdroms.ietf@gmail.com>
In-reply-to: <20120829182602.22800.41833.idtracker@ietfa.amsl.com>
References: <20120829182602.22800.41833.idtracker@ietfa.amsl.com>
Comments: In-reply-to "Ralph Droms" <rdroms.ietf@gmail.com> message dated "Wed, 29 Aug 2012 11:26:02 -0700."
Date: Wed, 29 Aug 2012 20:42:23 -0400
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12083000-3534-0000-0000-00000C04956D
Cc: The IESG <iesg@ietf.org>, armd@ietf.org
Subject: Re: [armd] Ralph Droms' No Objection on draft-ietf-armd-problem-statement-03: (with COMMENT)
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Aug 2012 00:42:31 -0000
Hi Ralph. "Ralph Droms" <rdroms.ietf@gmail.com> writes: > Ralph Droms has entered the following ballot position for > draft-ietf-armd-problem-statement-03: No Objection > When responding, please keep the subject line intact and reply to all > email addresses included in the To and CC lines. (Feel free to cut this > introductory paragraph, however.) > Please refer to http://www.ietf.org/iesg/statement/discuss-criteria.html > for more information about IESG DISCUSS and COMMENT positions. > ---------------------------------------------------------------------- > COMMENT: > ---------------------------------------------------------------------- > 1. In section 7.1, does a high volume of ARP traffic have more impact > on routers than on hosts or VMs? If so, why? I think the answer is in some cases yes. At one level, the amount of ARP traffic a router receives is the same as a host. But (to cut to the chase) there are a number of reasons why the problem can be worse for routers: 1) router architectures in practice can result in hosts being able to handle a higher rate of ARP requests. One can argue that routers should just fix their implementations, but that doesn't change the fact that in some deployments/implementations there are issues. 2) Routers sometimes have way more networks hanging off of them than hosts do. E.g., a router might have 100 interfaces (to 100 different networks - each generating ARP traffic the router would need to process), whereas hosts would on on only one network and hence see a lot less traffic. Hence, a router might see 100x more ARP traffic than one host. 3) Routers are the targets of a lot of communication. So a lot of ARP traffic is aimed at them. (Forwarding data traffic is fast/easy and done by the ASIC, ARP processing is slow, done in the software processor). I'm guessing a bit here, but I suspect that if you looked at a typical network, the average rate of ARP queries directed at nodes is likely higher for routers than hosts. One other detail (that the docuemnt doesn't get into) is that more recent implementations of Windows borrowed NUD from IPv6 and retrofitted it into IPv4. Thus, they generate unicast ARP queries frequently to revalidate entries associated with neighbors just like IPv6 does. This has noticably increased the ARP traffic routers have to process (on networks with more recent versions of Windows). > 2. In section 7.1, does the total volume of ARP traffic ever become > great enough to have a measurable impact on available traffic > capacity? What I'm told is that the CPU on routers can saturate or come close to saturating, meaning that they become unable to process all the ARP traffic and other essential routing functions as well. At this point, you start having major problems (e.g., the router isn't responding to other stuff it is supposed to in a timely manner). > 3. Does this sentence from section 7.2 imply that IPv6 stacks that > exhibit the described behavior are compliant with RFC 4861? > Consequently, some > implementations will send out "probe" ND queries to validate in-use > ND entries as frequently as every 35 seconds [RFC4861]. The above is the correct behavior as called for in 4861. While the time may seem short, its intended to insure that recovery takes place (should the router you are using go down) before TCP connections time out. > 4. I suggest dropping the sentence about the impact of VMs in section > 7.3. Any growth in the datacenter that increases the number of > addresses used in an L2 domain, whether it be the physical span of the > L2 domain or the use of VMs, will have the impact described in section > 7.3. The impact of growth will also have an impact on the scenarios > in section 7.1 and 7.2. The specific impact of VMs is also mentioned > earlier in the document. But is it is well documented that virtualization (using VMs) exacerbates the problem. So I think saying so here is useful to mention (even if redundent). > 5. Are the three problems described in sections 7.1-3 really the only > address resolution problems in large datacenters? Well, they are the ones I know of and that the WG called out... Do you think there are others? > How do the three problems interact with each other (as mentioned at > the end of section 7.3), when the ARP and ND problems seem to be > related to CPU usage and the MAC table issue seems to be a memory > problem. The problem is just the more processing that has to be done, the less cycles there are to go around. And in some deployments there aren't quite enough cycles, so anything that adds to the load is potentially problematical... > 6. It was a little surprising to me that section 5 describes multicast > ND for address resolution, but section 7.2 only cites the unicast use > of ND for NUD as a problem. The problem with ND and ARP are not so much about the bandwidth/network usage per se. It's really more about routers needing to process such packets. That's where things start breaking down (in some deployments). There aren't enough cycles in the router's service processor to do the work... So whether the packets received are multicast vs. unicast isn't the issue (for received packets) That section wasn't really trying to focus on multicast vs. unicast. Maybe that didn't come out as clearly as it could. I.e, the first paragraph really should say that in terms of processing of ND traffic on a router, many of the same costs/issues are equivalent to the case of handling an ARP packet. How about I change the first paragraph as follows: old: Though IPv6's Neighbor Discovery behaves much like ARP there are several notable differences which result in a different set of potential issues. From an L2 perspective there is the simple difference between sending to a multicast versus broadcast address which results in ND queries only being processed by the nodes for which they are intended. new: Though IPv6's Neighbor Discovery behaves much like ARP there are several notable differences which result in a different set of potential issues. From an L2 perspective, an important difference is that ND address resolution requests are sent via multicast, which results in ND queries only being processed by the nodes for which they are intended. This reduces the total number of ND packets that an implementation will receive compared with broadcast ARPs. Thomas
- Re: [armd] Ralph Droms' No Objection on draft-iet… Thomas Narten