Re: [armd] RtgDir review: draft-ietf-armd-problem-statement-03

"Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com> Tue, 28 August 2012 08:25 UTC

Return-Path: <manav.bhatia@alcatel-lucent.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2D13F11E80DE; Tue, 28 Aug 2012 01:25:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.349
X-Spam-Level:
X-Spam-Status: No, score=-9.349 tagged_above=-999 required=5 tests=[AWL=1.250, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NBIb85gHjatM; Tue, 28 Aug 2012 01:25:15 -0700 (PDT)
Received: from ihemail2.lucent.com (ihemail2.lucent.com [135.245.0.35]) by ietfa.amsl.com (Postfix) with ESMTP id C0D4D11E80A3; Tue, 28 Aug 2012 01:25:15 -0700 (PDT)
Received: from inbansmailrelay1.in.alcatel-lucent.com (h135-250-11-31.lucent.com [135.250.11.31]) by ihemail2.lucent.com (8.13.8/IER-o) with ESMTP id q7S8PBwW019574 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 28 Aug 2012 03:25:14 -0500 (CDT)
Received: from INBANSXCHHUB02.in.alcatel-lucent.com (inbansxchhub02.in.alcatel-lucent.com [135.250.12.35]) by inbansmailrelay1.in.alcatel-lucent.com (8.14.3/8.14.3/GMO) with ESMTP id q7S8P9Gl002910 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT); Tue, 28 Aug 2012 13:55:10 +0530
Received: from INBANSXCHMBSA1.in.alcatel-lucent.com ([135.250.12.38]) by INBANSXCHHUB02.in.alcatel-lucent.com ([135.250.12.35]) with mapi; Tue, 28 Aug 2012 13:55:08 +0530
From: "Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com>
To: Thomas Narten <narten@us.ibm.com>
Date: Tue, 28 Aug 2012 13:50:59 +0530
Thread-Topic: RtgDir review: draft-ietf-armd-problem-statement-03
Thread-Index: Ac2Emn+J32bcE+5qQJSlycTwjNR6KgAOZYow
Message-ID: <7C362EEF9C7896468B36C9B79200D8350D06450BB6@INBANSXCHMBSA1.in.alcatel-lucent.com>
References: <7C362EEF9C7896468B36C9B79200D8350D063A0AF5@INBANSXCHMBSA1.in.alcatel-lucent.com> <201208272124.q7RLOnx7015943@cichlid.raleigh.ibm.com>
In-Reply-To: <201208272124.q7RLOnx7015943@cichlid.raleigh.ibm.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.57 on 135.245.2.35
Cc: "rtg-dir@ietf.org" <rtg-dir@ietf.org>, "armd@ietf.org" <armd@ietf.org>, "draft-ietf-armd-problem-statement.all@tools.ietf.org" <draft-ietf-armd-problem-statement.all@tools.ietf.org>, "rtg-ads@tools.ietf.org" <rtg-ads@tools.ietf.org>
Subject: Re: [armd] RtgDir review: draft-ietf-armd-problem-statement-03
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Aug 2012 08:25:17 -0000

Hi Thomas,

[clipped]
 
> This is poorly worded. How about I replace  the paragraph with the
> following:
> 
> 	Broadly speaking, from the perspective of address resolution,
>         IPv6's Neighbor Discovery (ND) behaves much like ARP, with a
>         few notable differences. First, ARP uses broadcast, whereas ND
>         uses multicast. Specifically, when querying for a target IP
>         address, ND maps the target address into an IPv6 Solicited
>         Node multicast address. Using multicast rather than broadcast
>         has the benefit that the multicast frames do not necessarily
>         need to be sent to all parts of the network, i.e., only to
>         segments where listeners for the Solicited Node multicast
>         address reside. In the case where multicast frames are
>         delivered to all parts of the network, sending to a multicast
>         still has the advantage that most (if not all) nodes will
>         filter out the (unwanted) multicast query via filters
>         installed in the NIC rather than burdening host software with
>         the need to process such packets. Thus, whereas all nodes must
>         process every ARP query, ND queries are processed only by the
>         nodes to which they are intended. In cases where multicast
>         filtering can't effectively be implemented in the NIC (e.g.,
>         as on hypervisors supporting virtualization), filtering would
>         need to be done in software (e.g., in the hypervisor's
>         vSwitch).

> 
> > "may" seems to indicate that there are scenarios when a multicast
> >  from an L2 perspective will not be delivered to all nodes.
> 
> Correct.
> 
> > I am unable to envisage a scenario when this can happen? All BUM
> >  (broadcast, unlearnt unicast and multicast) traffic in vanilla L2
> >  and VPLS (Virtual Private Lan Service) is delivered to *all*
> >  nodes. There are exceptions in H-VPLS or if MMRP is enabled but I
> >  suspect if the authors had this in their mind when they wrote the
> >  above text.
> 
> Hopefully the proposed text answers the above questions.

Thanks, the proposed text is much better.

However, the draft still says "multicast frames do not necessarily need to be sent to all parts of the network". I could be missing something but there still seems to be some disconnect because in the context of L2, multicast frames will be sent to all parts of the network. 

> 
> > 2. Sec 7.1 begins with the following text:
> 
> > "One pain point with large L2 broadcast domains is that the routers
> >  connected to the L2 domain need to process "a lot of" ARP traffic."
> 
> > I am not sure if this is correct with how an L2 broadcast domain has
> >  been defined in Sec 2. I would wager that a bigger pain point for a
> >  large L2 broadcast domain would be handling unknown unicast traffic
> >  that needs to get flooded, instead of dealing with the "ARP"
> >  traffic. I am aware of very very large L2 broadcast domains that
> >  have no ARP/ND scaling problems. Would it then make more sense to
> >  replace the L2 broadcast domain with an ARP/ND domain? If Yes, then
> >  ARP/ND domain too needs to be defined in Sec 2.
> 
> The issue (as has been discussed in ARMD) is specifically the ARP
> processing load (and not unknown unicast traffic). In typical
> implementations, ARP processing is done by a service processor with
> limited capacity. The cited problem is that the amount of ARP traffic
> places a significant load on that processor.
> 
> This is explained in the next pargraph. How about I add the following
> sentence to the 2nd paragraph.:
> 
>      In some deployments, limitations on the rate of ARP processing
>      have been cited as being a problem.
> 
> Does that work?

Yes it does as long as you remove the original line that I had quoted.

> 
> > 3. Sec 7.1 seems to suggest that Gratuitous ARPs pre-populate ARP
> >  caches on the neighboring devices. Without an explicit description
> >  of what a neighboring device is, I would presume that this also
> >  includes edge/core routers. In that case this statement is not
> >  entirely correct as I am aware of routers that will by default not
> >  pre-populate their ARP caches on receiving Gratuitous ARPs.
> 
> Right. The spec says "don't do this". But I believe it was asserted
> that some implementations do this. That said, I'm not aware of any
> such implementations. I would be willing to remove this sentence in
> the absence of known implementations of this.

This clearly is not the default behavior for several core/edge router implementations that I am aware of. So at best there could be a subset of routers that do this. In which case you need to fix the text that claims that *all* routers pre-populate ARP caches upon receiving Gratuitous ARPs.

> 
> > 4. Sec 7.2 must also discuss the scaling impact of how the neighbor
> >  cache is maintained in IPv6 - especially the impact of moving the
> >  neighbor state from REACHABLE to STALE. Once the "IPv6 ARP" gets
> >  resolved the neighbor entry moves from the REACHABLE to STALE after
> >  around 30secs. The neighbor entry remains in this state till a
> >  packet needs to be forwarded to this neighbor. The first time a
> >  node sends a packet to a neighbor whose entry is STALE, the sender
> >  changes the state to DELAY and sets a timer to expire in around 5
> >  seconds. Most routers initiate moving the state from STALE to DELAY
> >  by punting a copy of the data packet to CPU so that the sender can
> >  reinitiate the Neighbor discovery process. This patently can be
> >  quite CPU and buffer intensive if the neighbor cache size is huge.
> 
> This could be. But the WG did not report such specific details in
> terms of actual problems reported from deployments.
> 
> Care to say more about what these "most implementations" are and how
> common they are? And are they the *only* way to implement this
> feature, or have other vendors chosen different implementations
> without this limitation?
> 
> That said, I could add the following to the document:
> 
> 	Routers implementing NUD (for neighboring destinations) will
> 	need to process neighbor cache state changes such as
> 	transitioning entries from REACHABLE to STALE. How this
> 	capability is implemented may impact the scalabability of ND
> 	on a router. For example, one possible implementation is to
> 	have the forwarding operation detect when an ND entry is
> 	referenced that needs to transition from REACHABLE to STALE,
> 	by signalling an event that would need to be processed by the
> 	software processor. Such an implementation could increase the
> 	load on the service processor much in the same way that a high
> 	rate of ARP requests have led to problems on some routers.

Looks good.

[clipped]

> 
> 
> > 2. In Sec 7.1 you mention that routers need to drop all transit
> >  traffic when there is no response received for an ARP/ND
> >  request. You should mention that in addition to this, routers also
> >  need to send an ICMP host unreachable error packet back to the
> >  sender. ICMP error packets are generated in the control card
> >  CPU. So, if the CPU has to generate a high number of such ICMP
> >  errors then this can load the CPU. The whole process can be quite
> >  CPU as well as buffer intensive. The CPU/buffer overload is usually
> >  mitigated by rate limiting the number of ICMP errors generated.
> 
> Added:
> 
>    "and may send an ICMP destination unreachable message as well."

Why a "may"? An implementation is violating a standard if it isn't.

> 
> > 3. In Sec 7.1 you mention that the entire ARP/ND process can be
> >  quite CPU intensive since transit data traffic needs to be queued
> >  while the address resolution is underway. You could mention that
> >  this is mitigated by offloading the queuing part to the line card
> >  CPUs so that the CPU on the control card is not inundated with such
> >  packets. This obviously would only work on distributed systems that
> >  have separate CPUs on the line cards and the main card.
> 
> There are many things one could say about ARP implementations. But
> that is not the purpose of this document. It is really about outlining
> the problems... So I think the above is getting too detailed.
> 
> > 4. Sec 7.1 should mention that this could be used as a DoS attack
> >  wherein the attacker sends a high volume of packets for which ARPs
> >  need to be resolved. This could result in genuine packets that need
> >  to resolve ARPs getting dropped as there is only a finite rate at
> >  which packets are sent to CPU for ARP resolution. Again this is
> >  both CPU and buffer intensive.
> 
> Again, I don't think this document needs to cover all aspects of ND.
> 
> > 5. Sec 7.2 discusses issues with address resolution mechanism in
> >  IPv6. I think its useful for this draft to discuss the fact that
> >  unlike IPv4, IPv6 has subnets that are /64. This number is quite
> >  large and will perhaps cover trillions of IP addresses, most of
> >  which would be unassigned. Thus simplistic IPv6 ND implementations
> >  can be vulnerable to attacks which inundates the CPU with huge
> >  requests to perform address resolution for a large number of IPv6
> >  addresses, most of which are unassigned. As a result of this
> >  genuine IPv6 devices will not be able to join the network. You
> >  might want to refer to RFC 6583 for more details.
> 
> Ditto.

I am fine with your resolution to the comments 3 and 4. However, I believe that 5 ought to be discussed. This document is about ARP/ND issues that folks are either seeing or will see in large data centers. Given this, I don't see why this should not even be discussed in this draft. I think its quite reasonable to address the above mentioned aspect of IPv6 ND and one of way getting attention to issue is by discussing this here in this draft.

> 
> > 7. Sec 11 - Security Considerations should at the very least give
> >  pointers to references on issues related to ARP security
> >  vulnerabilities. I don't see IPv6 ND mentioned at all. Since ND
> >  relies on ICMPv6 and does not run directly over layer 2, there
> >  could possibly be security concerns specific to ND in the data
> >  center environments that don't apply to ARP. This document ought to
> >  discuss those so that ARMD (or some other WG) can look at solutions
> >  addressing those concerns.
> 
> Actually, I disagree somewhat. This document doesn't need to get into
> all the security issues of ARP and/or ND. For one thing, they did not
> come up as "problems" in ARMD. :-) I will put in pointers to the ND
> security considerations section. How about I add the following
> sentence:
> 
>     Security considerations for Neighbor Discovery are discussed in
>     <xref target="RFC4861"></xref> and <xref target="RFC6583"></xref>.

This should be good. I assume that this then means that there are no additional security concerns with ARPs/ND in data centers.

Can you also remove the first line from the Security Consideration? Its redundant and has already been said earlier.

> 
> > 8. Should it be mentioned in the document somewhere (sec 11?) that
> >  data center administrators can configure ACLs to filter packets
> >  addressed to unallocated IPv6 addresses? Folks can consider the
> >  valid IPv6 address ranges and filter out packets that use the
> >  unallocated addresses. Doing this will avoid unnecessary ARP
> >  resolution for invalid IPv6 addresses. The list of the IPv6
> >  addresses that are legitimate and should be permitted is small and
> >  maintainable because of IPv6's address
> >  hierarchy. http://www.iana.org/assignments/ipv6-unicast-address-
> assignments/ipv6-unicast-address-assignments.xml
> >  gives a list of large address blocks that have been allocated by
> >  IANA.
> 
> IMO no. This goes beyond the scope of this document.

While I don't see any harm in mentioning this, I leave it on you/WG to decide if you want to include this or not.

I just noticed that Sec 8 - Summary, is redundant. Shouldnt that entire text be moved to either the Abstract or the Introduction?

Cheers, Manav