Re: [armd] RtgDir review: draft-ietf-armd-problem-statement-03

Thomas Narten <narten@us.ibm.com> Wed, 29 August 2012 14:58 UTC

Return-Path: <narten@us.ibm.com>
X-Original-To: armd@ietfa.amsl.com
Delivered-To: armd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4AD8D21F8624 for <armd@ietfa.amsl.com>; Wed, 29 Aug 2012 07:58:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.599
X-Spam-Level:
X-Spam-Status: No, score=-110.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id triUcZAyP6ac for <armd@ietfa.amsl.com>; Wed, 29 Aug 2012 07:58:53 -0700 (PDT)
Received: from e9.ny.us.ibm.com (e9.ny.us.ibm.com [32.97.182.139]) by ietfa.amsl.com (Postfix) with ESMTP id B1A5121F8647 for <armd@ietf.org>; Wed, 29 Aug 2012 07:58:52 -0700 (PDT)
Received: from /spool/local by e9.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <armd@ietf.org> from <narten@us.ibm.com>; Wed, 29 Aug 2012 10:58:51 -0400
Received: from d01dlp02.pok.ibm.com (9.56.250.167) by e9.ny.us.ibm.com (192.168.1.109) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 29 Aug 2012 10:58:49 -0400
Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 3BACE6E8041; Wed, 29 Aug 2012 10:58:48 -0400 (EDT)
Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q7TEwlCk118912; Wed, 29 Aug 2012 10:58:47 -0400
Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q7TEwlZN010479; Wed, 29 Aug 2012 10:58:47 -0400
Received: from cichlid.raleigh.ibm.com ([9.80.31.201]) by d01av01.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q7TEwk25010315 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Aug 2012 10:58:47 -0400
Received: from cichlid.raleigh.ibm.com (localhost.localdomain [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.5/8.12.5) with ESMTP id q7TEwgxI011886; Wed, 29 Aug 2012 10:58:42 -0400
Message-Id: <201208291458.q7TEwgxI011886@cichlid.raleigh.ibm.com>
To: "Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com>
In-reply-to: <7C362EEF9C7896468B36C9B79200D8350D06450BB6@INBANSXCHMBSA1.in.alcatel-lucent.com>
References: <7C362EEF9C7896468B36C9B79200D8350D063A0AF5@INBANSXCHMBSA1.in.alcatel-lucent.com> <201208272124.q7RLOnx7015943@cichlid.raleigh.ibm.com> <7C362EEF9C7896468B36C9B79200D8350D06450BB6@INBANSXCHMBSA1.in.alcatel-lucent.com>
Comments: In-reply-to "Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com> message dated "Tue, 28 Aug 2012 13:50:59 +0530."
Date: Wed, 29 Aug 2012 10:58:42 -0400
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12082914-7182-0000-0000-0000026F8A71
Cc: "rtg-dir@ietf.org" <rtg-dir@ietf.org>, "armd@ietf.org" <armd@ietf.org>, "draft-ietf-armd-problem-statement.all@tools.ietf.org" <draft-ietf-armd-problem-statement.all@tools.ietf.org>, "rtg-ads@tools.ietf.org" <rtg-ads@tools.ietf.org>
Subject: Re: [armd] RtgDir review: draft-ietf-armd-problem-statement-03
X-BeenThere: armd@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Discussion of issues associated with large amount of virtual machines being introduced in data centers and virtual hosts introduced by Cloud Computing." <armd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/armd>, <mailto:armd-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/armd>
List-Post: <mailto:armd@ietf.org>
List-Help: <mailto:armd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/armd>, <mailto:armd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Aug 2012 14:58:54 -0000

"Bhatia, Manav (Manav)" <manav.bhatia@alcatel-lucent.com> writes:
> Thanks, the proposed text is much better.

> However, the draft still says "multicast frames do not necessarily
>  need to be sent to all parts of the network". I could be missing
>  something but there still seems to be some disconnect because in
>  the context of L2, multicast frames will be sent to all parts of
>  the network.

L2 IGMP snooping may be taking place, which can then result in multicast
traffic not being forwarded everywhere in the L2 broadcst domain...

> > 
> > > 2. Sec 7.1 begins with the following text:
> > 
> > > "One pain point with large L2 broadcast domains is that the routers
> > >  connected to the L2 domain need to process "a lot of" ARP traffic."
> > 
> > > I am not sure if this is correct with how an L2 broadcast domain has
> > >  been defined in Sec 2. I would wager that a bigger pain point for a
> > >  large L2 broadcast domain would be handling unknown unicast traffic
> > >  that needs to get flooded, instead of dealing with the "ARP"
> > >  traffic. I am aware of very very large L2 broadcast domains that
> > >  have no ARP/ND scaling problems. Would it then make more sense to
> > >  replace the L2 broadcast domain with an ARP/ND domain? If Yes, then
> > >  ARP/ND domain too needs to be defined in Sec 2.
> > 
> > The issue (as has been discussed in ARMD) is specifically the ARP
> > processing load (and not unknown unicast traffic). In typical
> > implementations, ARP processing is done by a service processor with
> > limited capacity. The cited problem is that the amount of ARP traffic
> > places a significant load on that processor.
> > 
> > This is explained in the next pargraph. How about I add the following
> > sentence to the 2nd paragraph.:
> > 
> >      In some deployments, limitations on the rate of ARP processing
> >      have been cited as being a problem.
> > 
> > Does that work?

> Yes it does as long as you remove the original line that I had
  quoted.

Removing that line IMO removes something essential. It is the case
that on some routers (i.e., devices at the edge of an L2 boundary) do
not have sufficient resources to process "a lot of ARP traffic". "a
lot" is in quotes because we don't have an exact figure for what that
is. This is one of the key points to come out of the ARMD effort.

What exactly do you object to in that sentence?

> > 
> > > 3. Sec 7.1 seems to suggest that Gratuitous ARPs pre-populate ARP
> > >  caches on the neighboring devices. Without an explicit description
> > >  of what a neighboring device is, I would presume that this also
> > >  includes edge/core routers. In that case this statement is not
> > >  entirely correct as I am aware of routers that will by default not
> > >  pre-populate their ARP caches on receiving Gratuitous ARPs.
> > 
> > Right. The spec says "don't do this". But I believe it was asserted
> > that some implementations do this. That said, I'm not aware of any
> > such implementations. I would be willing to remove this sentence in
> > the absence of known implementations of this.

To clarify, the current text says "Some routers can be configured to
broadcast periodic gratuitous ARPs."

This statement is true, and presumably you are not objecting to
that. right?

Note also that Warren Kumari
(http://www.ietf.org/mail-archive/web/armd/current/msg00489.html)
reports the Cisco IOS at one point could be configured to pre-populate
ARP caches via received gratuitous ARPs.

> This clearly is not the default behavior for several core/edge
>  router implementations that I am aware of. So at best there could
>  be a subset of routers that do this.

Which I believe is consistent with the current text saying "some
routers".

> In which case you need to fix
>  the text that claims that *all* routers pre-populate ARP caches
>  upon receiving Gratuitous ARPs.

How about I change the sentence:

    Gratuitous ARPs, broadcast to all nodes in the L2 broadcast
    domain, can also pre-populate ARP caches on neighboring devices,
    further reducing ARP traffic.

to:

    Gratuitous ARPs, broadcast to all nodes in the L2 broadcast
    domain, may in some cases also pre-populate ARP caches on
    neighboring devices, further reducing ARP traffic. But it is not
    believed that pre-population of ARP entries is supported by most
    implementations, as the ARP specification <xref
    target="RFC0826"></xref> recommends only that pre-existing ARP
    entries be updated upon receipt of ARP messages; it does not call
    for the creation of new entries when none already exist.

> > > 2. In Sec 7.1 you mention that routers need to drop all transit
> > >  traffic when there is no response received for an ARP/ND
> > >  request. You should mention that in addition to this, routers also
> > >  need to send an ICMP host unreachable error packet back to the
> > >  sender. ICMP error packets are generated in the control card
> > >  CPU. So, if the CPU has to generate a high number of such ICMP
> > >  errors then this can load the CPU. The whole process can be quite
> > >  CPU as well as buffer intensive. The CPU/buffer overload is usually
> > >  mitigated by rate limiting the number of ICMP errors generated.
> > 
> > Added:
> > 
> >    "and may send an ICMP destination unreachable message as well."

> Why a "may"? An implementation is violating a standard if it isn't.

The might not if rate limiting says otherwise. I.e., there are times
when an ICMP won't be sent that is not in violation of the spec.

> > > 3. In Sec 7.1 you mention that the entire ARP/ND process can be
> > >  quite CPU intensive since transit data traffic needs to be queued
> > >  while the address resolution is underway. You could mention that
> > >  this is mitigated by offloading the queuing part to the line card
> > >  CPUs so that the CPU on the control card is not inundated with such
> > >  packets. This obviously would only work on distributed systems that
> > >  have separate CPUs on the line cards and the main card.
> > 
> > There are many things one could say about ARP implementations. But
> > that is not the purpose of this document. It is really about outlining
> > the problems... So I think the above is getting too detailed.
> > 
> > > 4. Sec 7.1 should mention that this could be used as a DoS attack
> > >  wherein the attacker sends a high volume of packets for which ARPs
> > >  need to be resolved. This could result in genuine packets that need
> > >  to resolve ARPs getting dropped as there is only a finite rate at
> > >  which packets are sent to CPU for ARP resolution. Again this is
> > >  both CPU and buffer intensive.
> > 
> > Again, I don't think this document needs to cover all aspects of ND.
> > 
> > > 5. Sec 7.2 discusses issues with address resolution mechanism in
> > >  IPv6. I think its useful for this draft to discuss the fact that
> > >  unlike IPv4, IPv6 has subnets that are /64. This number is quite
> > >  large and will perhaps cover trillions of IP addresses, most of
> > >  which would be unassigned. Thus simplistic IPv6 ND implementations
> > >  can be vulnerable to attacks which inundates the CPU with huge
> > >  requests to perform address resolution for a large number of IPv6
> > >  addresses, most of which are unassigned. As a result of this
> > >  genuine IPv6 devices will not be able to join the network. You
> > >  might want to refer to RFC 6583 for more details.
> > 
> > Ditto.

> I am fine with your resolution to the comments 3 and 4. However, I
>  believe that 5 ought to be discussed. This document is about ARP/ND
>  issues that folks are either seeing or will see in large data
>  centers.

To clarify: "are seeing". We can speculate at length for what problems
will be seen in the future. :-)

> Given this, I don't see why this should not even be discussed in
>  this draft. I think its quite reasonable to address the above
>  mentioned aspect of IPv6 ND and one of way getting attention to
>  issue is by discussing this here in this draft.

The issue you raise above is fully documented in RFC 6583, which I
have added to the references (per my previous note).

> > > 7. Sec 11 - Security Considerations should at the very least give
> > >  pointers to references on issues related to ARP security
> > >  vulnerabilities. I don't see IPv6 ND mentioned at all. Since ND
> > >  relies on ICMPv6 and does not run directly over layer 2, there
> > >  could possibly be security concerns specific to ND in the data
> > >  center environments that don't apply to ARP. This document ought to
> > >  discuss those so that ARMD (or some other WG) can look at solutions
> > >  addressing those concerns.
> > 
> > Actually, I disagree somewhat. This document doesn't need to get into
> > all the security issues of ARP and/or ND. For one thing, they did not
> > come up as "problems" in ARMD. :-) I will put in pointers to the ND
> > security considerations section. How about I add the following
> > sentence:
> > 
> >     Security considerations for Neighbor Discovery are discussed in
> >     <xref target="RFC4861"></xref> and <xref target="RFC6583"></xref>.

> This should be good. I assume that this then means that there are no
>  additional security concerns with ARPs/ND in data centers.

I don't recall any coming up in the WG.

> Can you also remove the first line from the Security Consideration?
>  Its redundant and has already been said earlier.

OK.

> > > 8. Should it be mentioned in the document somewhere (sec 11?) that
> > >  data center administrators can configure ACLs to filter packets
> > >  addressed to unallocated IPv6 addresses? Folks can consider the
> > >  valid IPv6 address ranges and filter out packets that use the
> > >  unallocated addresses. Doing this will avoid unnecessary ARP
> > >  resolution for invalid IPv6 addresses. The list of the IPv6
> > >  addresses that are legitimate and should be permitted is small and
> > >  maintainable because of IPv6's address
> > >  hierarchy. http://www.iana.org/assignments/ipv6-unicast-address-
> > assignments/ipv6-unicast-address-assignments.xml
> > >  gives a list of large address blocks that have been allocated by
> > >  IANA.
> > 
> > IMO no. This goes beyond the scope of this document.

> While I don't see any harm in mentioning this, I leave it on you/WG
>  to decide if you want to include this or not.

> I just noticed that Sec 8 - Summary, is redundant. Shouldnt that
  entire text be moved to either the Abstract or the Introduction?

It's the last section of the document. The document needs a summary or
something (summary seems more accurate than conclusions, IMO).

Thomas