Re: [6man] New Version Notification for draft-nordmark-6man-impatient-nud-00.txt

Ray Hunter <v6ops@globis.net> Tue, 24 May 2011 18:33 UTC

Return-Path: <v6ops@globis.net>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 41D7DE0765 for <ipv6@ietfa.amsl.com>; Tue, 24 May 2011 11:33:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.774
X-Spam-Level:
X-Spam-Status: No, score=-2.774 tagged_above=-999 required=5 tests=[AWL=-0.175, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cgbFKvpJEAxm for <ipv6@ietfa.amsl.com>; Tue, 24 May 2011 11:33:21 -0700 (PDT)
Received: from globis01.globis.net (RayH-1-pt.tunnel.tserv11.ams1.ipv6.he.net [IPv6:2001:470:1f14:62e::2]) by ietfa.amsl.com (Postfix) with ESMTP id 3A6A4E06C0 for <ipv6@ietf.org>; Tue, 24 May 2011 11:33:21 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by globis01.globis.net (Postfix) with ESMTP id C37D6870086; Tue, 24 May 2011 20:33:18 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at globis01.globis.net
Received: from globis01.globis.net ([127.0.0.1]) by localhost (mail.globis.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QcU0KFv4f6ma; Tue, 24 May 2011 20:33:13 +0200 (CEST)
Received: from Rays-iMac.local (unknown [192.168.0.3]) (Authenticated sender: Ray.Hunter@globis.net) by globis01.globis.net (Postfix) with ESMTPA id 91E77870082; Tue, 24 May 2011 20:33:13 +0200 (CEST)
Message-ID: <4DDBF9E9.1040702@globis.net>
Date: Tue, 24 May 2011 20:33:13 +0200
From: Ray Hunter <v6ops@globis.net>
User-Agent: Postbox Express 1.0.1 (Macintosh/20100705)
MIME-Version: 1.0
To: Erik Nordmark <nordmark@acm.org>
Subject: Re: [6man] New Version Notification for draft-nordmark-6man-impatient-nud-00.txt
References: <4DDABDA6.2070705@globis.net> <4DDBE1BF.7040104@acm.org>
In-Reply-To: <4DDBE1BF.7040104@acm.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: ipv6@ietf.org
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 May 2011 18:33:22 -0000

Thanks very much for replying.

I think that I understand the motivation in that multicast is expensive 
on some media, and that you thus want to avoid it.

I'm always prepared to be dazzled by my lack of knowledge or incorrect 
assumptions. I'd much rather ask a dumb question and get a smart answer 
than just say nothing.

The idea of the draft seems to be that spending more time performing 
unicast neighbor solicitations in the "probe" state might avoid deleting 
the neighbor entry, and thus the relearning the entry via "expensive" 
multicast NS from the "-" state.

Seems perfectly reasonable and something worth pursuing.

Erik Nordmark wrote:
> Are you assuming that the routers inject host routes into the routing 
> system based on the ND state? The routers inject a route for the 
> subnet prefix which isn't tied to the ND state in any way. 

Yes, on the local link the router injects RA information based on a 
(statically configured or PD learned) prefix. But routers and other 
devices also redistribute reachability information elsewhere via other 
protocols.

The assumption that ND really is independent of everything else is what 
I'm questioning myself, although I freely admit to a large dose of hand 
waving here. ND isn't like ARP, as you of course know.

An ARP cache entry would sit there silently for 4 hours by default and 
do nothing, so packets could black hole if the next hop was learned via 
a static route. Higher level protocols would have to detect the problem 
themselves.

ND removing an entry by NUD probe failure retriggers next hop 
determination, and AFAIK also actively triggers replying to remote nodes 
with an ICMP unreachable message, and so ND can thus can effectively 
disseminate reachability information far further than just the local link.

A later post I made gave an example of reachability info being 
indirectly based on ND (via a BGP neighbor peering TCP session becoming 
unreachable due to an ICMPv6 unreachable, leading to route information 
changing). I can imagine the same for EIGRP, OSPF if their neighbors 
disappear due to receipt of an ICMPv6 unreachable (although 
traditionally these implementations have tended to ignore ICMP for good 
reason).

Another example sort of device that sometimes transmist reachability 
information via TCP are WAN accelerators, that auto build network 
tunnels, and then send routing information across these. Again, an 
ICMPv6 unreachable might cause the device to tear down the tunnel.

HSRP preference metrics can also potentially be influenced by 
reachability information (ND) from another link (via track commands).

Then there are also those dreaded silent devices (that we don't talk 
much about but which are generally plonked on the very most critical 
link into the main data centre), such as network intrusion detection 
systems and firewalls, that actively monitor traffic across their links, 
but that don't take part in any official routing protocol exchanges, and 
can fail over to a back up system without informing anyone else by 
marking interfaces up and down.

Using the example of spanning tree, waiting for STP would probably mean 
waiting 35 seconds (max_age + forwarding delay) in the default case for 
the root bridge to send out topology notification BPDU's. That's a long 
time in many protocols.



So I guess the question is also, do you want NUD to inform higher layers 
of the need for a fail over ASAP of a local link failure via ICMP 
unreachables (as Thomas seemed to suggest), or do you want ND to shut up 
and just keep on retrying locally and let those higher layer protocols 
hit their own time outs and take their own fail over actions?

Current ND seems to go the ASAP route with its 3 second timeout. 
Historically, ARP seems to go the silent route.

It just feels to me like all nodes on a common link should behave the 
same way in this respect (no scientific argument, just raw gut feeling 
of deja vu, and impending packet storms)

And if all nodes on the link aren't behaving the same way, don't you 
still get say 50% of the multicasts as the partner nodes revert to the 
"-" state by timing out "too fast" for that link type?

Just seems like another reason to have this as a "per link" parameter 
rather than a "per node" parameter.

Best regards,
RayH