Re: [6man] New Version Notification for draft-nordmark-6man-impatient-nud-00.txt

Erik Nordmark <nordmark@acm.org> Wed, 25 May 2011 15:59 UTC

Return-Path: <nordmark@acm.org>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CC74AE06A8 for <ipv6@ietfa.amsl.com>; Wed, 25 May 2011 08:59:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.599
X-Spam-Level:
X-Spam-Status: No, score=-103.599 tagged_above=-999 required=5 tests=[AWL=-1.000, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6+Iw-2YEpzXt for <ipv6@ietfa.amsl.com>; Wed, 25 May 2011 08:59:59 -0700 (PDT)
Received: from b.mail.sonic.net (b.mail.sonic.net [64.142.19.5]) by ietfa.amsl.com (Postfix) with ESMTP id 09B51E067C for <ipv6@ietf.org>; Wed, 25 May 2011 08:59:59 -0700 (PDT)
Received: from [10.0.1.6] (128-107-239-233.cisco.com [128.107.239.233]) (authenticated bits=0) by b.mail.sonic.net (8.13.8.Beta0-Sonic/8.13.7) with ESMTP id p4PFxr3p002286 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO); Wed, 25 May 2011 08:59:53 -0700
Message-ID: <4DDD2778.2030008@acm.org>
Date: Wed, 25 May 2011 08:59:52 -0700
From: Erik Nordmark <nordmark@acm.org>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: Ray Hunter <v6ops@globis.net>
Subject: Re: [6man] New Version Notification for draft-nordmark-6man-impatient-nud-00.txt
References: <4DDABDA6.2070705@globis.net> <4DDBE1BF.7040104@acm.org> <4DDBF9E9.1040702@globis.net>
In-Reply-To: <4DDBF9E9.1040702@globis.net>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: ipv6@ietf.org
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 May 2011 15:59:59 -0000

On 5/24/11 11:33 AM, Ray Hunter wrote:
> Thanks very much for replying.
>
> I think that I understand the motivation in that multicast is expensive
> on some media, and that you thus want to avoid it.
>
> I'm always prepared to be dazzled by my lack of knowledge or incorrect
> assumptions. I'd much rather ask a dumb question and get a smart answer
> than just say nothing.
>
> The idea of the draft seems to be that spending more time performing
> unicast neighbor solicitations in the "probe" state might avoid deleting
> the neighbor entry, and thus the relearning the entry via "expensive"
> multicast NS from the "-" state.

If the NCE is retained, there is the added benefit of knowing that a 
node with that IPv6 address exists (or recently existed) on the link, 
which is helpful to know when dealing with DoS scans across the IPv6 
address space.

> Seems perfectly reasonable and something worth pursuing.

Good.

> Erik Nordmark wrote:
>> Are you assuming that the routers inject host routes into the routing
>> system based on the ND state? The routers inject a route for the
>> subnet prefix which isn't tied to the ND state in any way.
>
> Yes, on the local link the router injects RA information based on a
> (statically configured or PD learned) prefix. But routers and other
> devices also redistribute reachability information elsewhere via other
> protocols.
>
> The assumption that ND really is independent of everything else is what
> I'm questioning myself, although I freely admit to a large dose of hand
> waving here. ND isn't like ARP, as you of course know.

In terms of the interaction with routing, I certainly don't know of any 
case where RFC 4861 behaves differently than RFC 826. Perhaps you can 
enlighten me on the differences you see in the interaction with routing?

> An ARP cache entry would sit there silently for 4 hours by default and
> do nothing, so packets could black hole if the next hop was learned via
> a static route. Higher level protocols would have to detect the problem
> themselves.

Such an ARP implementation wouldn't follow RFC 1122.
Note that RFC 1122 also has a MUST for dead gateway detection (with 
"gateway" being an old term for "router"), but it doesn't specify how 
that is performed, and I don't think it is commonly implemented in hosts.
NUD was designed to solve the dead gateway detection, but generalizing 
it to nodes instead of just routers.

> ND removing an entry by NUD probe failure retriggers next hop
> determination, and AFAIK also actively triggers replying to remote nodes
> with an ICMP unreachable message, and so ND can thus can effectively
> disseminate reachability information far further than just the local link.
>
> A later post I made gave an example of reachability info being
> indirectly based on ND (via a BGP neighbor peering TCP session becoming
> unreachable due to an ICMPv6 unreachable, leading to route information
> changing). I can imagine the same for EIGRP, OSPF if their neighbors
> disappear due to receipt of an ICMPv6 unreachable (although
> traditionally these implementations have tended to ignore ICMP for good
> reason).

Again, if you look at RFC 1122 you'll find that (in section 4.2.3.9) 
"TCP MUST NOT abort the connection" for the soft errors like many ICMP 
unreachables. Such soft errors might be for transient conditions, and 
the intention is that TCP survive such transients.
Thus I think your (hopefully hypothetical) example indicates bad TCP 
implementation.

> Another example sort of device that sometimes transmist reachability
> information via TCP are WAN accelerators, that auto build network
> tunnels, and then send routing information across these. Again, an
> ICMPv6 unreachable might cause the device to tear down the tunnel.

Ditto.

> HSRP preference metrics can also potentially be influenced by
> reachability information (ND) from another link (via track commands).
>
> Then there are also those dreaded silent devices (that we don't talk
> much about but which are generally plonked on the very most critical
> link into the main data centre), such as network intrusion detection
> systems and firewalls, that actively monitor traffic across their links,
> but that don't take part in any official routing protocol exchanges, and
> can fail over to a back up system without informing anyone else by
> marking interfaces up and down.
>
> Using the example of spanning tree, waiting for STP would probably mean
> waiting 35 seconds (max_age + forwarding delay) in the default case for
> the root bridge to send out topology notification BPDU's. That's a long
> time in many protocols.

Sure. And a protocol like BGP might be tuned to timeout in less time 
than 35 seconds (whether or not that is a good timeout isn't my point). 
But that wouldn't be impacted whether IPv4 or IPv6 is used.
I think there is good operational understanding of how to tune things 
for IPv4, and that shouldn't have to change as IPv6 gets more widely 
deployed. I don't see the ICMP address unreachable errors due to NUD 
failures making a difference, because TCP must not reset the connection 
as a result of them.
But I do see the three second NUD hard limit as being a case where 
IPv6/ND is different than IPV4/ARP for no good reason.

> So I guess the question is also, do you want NUD to inform higher layers
> of the need for a fail over ASAP of a local link failure via ICMP
> unreachables (as Thomas seemed to suggest), or do you want ND to shut up
> and just keep on retrying locally and let those higher layer protocols
> hit their own time outs and take their own fail over actions?

There is a huge difference between making local decisions based on NUD 
failures (such as a host using a different default router) and trying to 
interpret the ICMP address unreachable from afar. My take is that the 
ICMP address unreachable is a useful diagnostic tool - ping and 
traceroute will report it - but it is a soft error hence it doesn't make 
sense to have TCP or other protocols give up as a result of such ICMP 
errors.

> Current ND seems to go the ASAP route with its 3 second timeout.
> Historically, ARP seems to go the silent route.
>
> It just feels to me like all nodes on a common link should behave the
> same way in this respect (no scientific argument, just raw gut feeling
> of deja vu, and impending packet storms)

I don't understand why you think all the nodes on a link need to be 
coordinated in such a way; the Internet protocols are designed to be 
robust and not assume that all the nodes have the same code and tuning. 
For instance, we don't require that ARP or TCP on all the nodes on a 
link to have the same timer values, and things work just fine.

    Erik

> And if all nodes on the link aren't behaving the same way, don't you
> still get say 50% of the multicasts as the partner nodes revert to the
> "-" state by timing out "too fast" for that link type?
>
> Just seems like another reason to have this as a "per link" parameter
> rather than a "per node" parameter.
>
> Best regards,
> RayH
>