Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast

Andrew Yourtchenko <ayourtch@cisco.com> Thu, 20 February 2014 16:56 UTC

Return-Path: <ayourtch@cisco.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9F64B1A0217 for <ipv6@ietfa.amsl.com>; Thu, 20 Feb 2014 08:56:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.15
X-Spam-Level:
X-Spam-Status: No, score=-10.15 tagged_above=-999 required=5 tests=[BAYES_20=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, GB_I_LETTER=-2, RP_MATCHES_RCVD=-0.548, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NUZEAvgofYBU for <ipv6@ietfa.amsl.com>; Thu, 20 Feb 2014 08:56:29 -0800 (PST)
Received: from alln-iport-3.cisco.com (alln-iport-3.cisco.com [173.37.142.90]) by ietfa.amsl.com (Postfix) with ESMTP id CDA651A0212 for <ipv6@ietf.org>; Thu, 20 Feb 2014 08:56:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=19264; q=dns/txt; s=iport; t=1392915385; x=1394124985; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=9/pWU1KtZQwe9ENg4TGmrnKVZM3XEjlRXthsAGJiZP4=; b=QcbnvFGZVbGLGHyutgJ3yVEnqn6vzoMBTKQJUCcmgCJVgFr/MvmVEvDR XNyRZkf6yTkY8kpTJgawtjVNhQGMKtzQGzA+ANPZvitZg61gRWEjUuCQo DXi6sFW2wliaVHaDgzVpwwdbE7AM4iCHqJTr6FQwC1dXH/4N/gjb7gCn0 A=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AoYJAGMzBlOtJV2a/2dsb2JhbABZgwY4V6gEA5gHgRAWdIIlAQEBAgEBJxECKw0HBQsLLRlXBg4eAgOHXwgNzkEXjXwRAwZJBQcKhC4ElEeWDYMugWcBBhkEHg
X-IronPort-AV: E=Sophos;i="4.97,513,1389744000"; d="scan'208";a="21924125"
Received: from rcdn-core-3.cisco.com ([173.37.93.154]) by alln-iport-3.cisco.com with ESMTP; 20 Feb 2014 16:56:21 +0000
Received: from xhc-rcd-x15.cisco.com (xhc-rcd-x15.cisco.com [173.37.183.89]) by rcdn-core-3.cisco.com (8.14.5/8.14.5) with ESMTP id s1KGuLHx032188 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 20 Feb 2014 16:56:21 GMT
Received: from ams3-vpn-dhcp3495.cisco.com (10.61.77.167) by xhc-rcd-x15.cisco.com (173.37.183.89) with Microsoft SMTP Server (TLS) id 14.3.123.3; Thu, 20 Feb 2014 10:56:20 -0600
Date: Thu, 20 Feb 2014 17:56:02 +0100
From: Andrew Yourtchenko <ayourtch@cisco.com>
X-X-Sender: ayourtch@ayourtch-mac
To: Erik Nordmark <nordmark@acm.org>
Subject: Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast
In-Reply-To: <5305AF13.5060201@acm.org>
Message-ID: <alpine.OSX.2.00.1402201404091.12073@ayourtch-mac>
References: <5305AF13.5060201@acm.org>
User-Agent: Alpine 2.00 (OSX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
X-Originating-IP: [10.61.77.167]
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/80KaSmyL7kMQ3zUlYs57ufgAy8o
X-Mailman-Approved-At: Thu, 20 Feb 2014 23:00:15 -0800
Cc: IETF IPv6 <ipv6@ietf.org>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Feb 2014 16:56:33 -0000

On Wed, 19 Feb 2014, Erik Nordmark wrote:

>
> Thanks for putting this draft together. Some questions and comments below.
>
> In section 3 you have:
> * One solicited RA per host joining the network (if solicited RAs are sent 
> using multicast)
> That isn't entirely true. Because of the rate limiting of multicast RAs if a 
> lot of hosts join at the same time, then there will be less.

agreed. If more than one host joins within ~ 3 seconds interval, only 
one multicast RA will be sent. I'll update the next rev with this 
clarification.

>
> In section 4.2 we can suggest an implementation that does unicast:
> If we believe that the flash crowd case is still important (when we wrote ND 
> the concern was around a building power failure and recovery - today it might 
> be a stadium setting), then one could implement a mix as follows:
> - When RS received start timer.
> - If N additional RSs arrives before timer fires, then multicast one RA. 
> Otherwise when timer fires unicast RAs to the received RSs.
> For N=1 this is trivial.

Yeah, it's a very useful optimization to consider, we discussed it I think 
alread. The reason I did not include it yet, was to go over the 
tradeoffs without the pressure of the submission deadline being a few 
hours away :)

1) from the efficiency standpoint this approach is fantastic for the wired 
and for the wireless (the implementations that do the mcast->ucast 
conversion over the air the win is a bit less, but nonetheless there is a 
win since we send only one packet on the wired side instead of many and 
save the CPU on the gateway).

2) This adds an instantaneous behavior change into the network at the peak 
load conditions - so if that code path has a problem, this creates 
hard-to-debug situation.

3) There are a few trickier scenarios with L3 roaming (hosts arriving 
from other subnets onto the same AP) and that AP having a single 802.11 
group encryption key, which this behavior might make easier to 
accidentally have broken.

But, all in all it can be very useful to have a knob to allow to send a 
multicast RA at the time when the gateway can't handle the flood of 
incoming RS anyway and has to drop them.


>
> In section 4.3 there are two suggestions: MLD snooping and L2 unicast or L3 
> multicast packets. I think there are some operational concerns around MLD 
> snooping - I hope others can fill in some information in that space since I 
> don't know the issues.
> (Editorially it would be clearer if those two are separate sub-sections.)
>
> The multicast-over-unicast refers to SAVI as a way to collect the state, but 
> it doesn't specify where you see that state used. Presumably for DAD (and NS 
> in general unless you also do section 4.7)? But SAVI doesn't claim to have 
> all addresses since it is concerned with conflicts. Thus a host can exist and 
> be silent - but DAD would still expect to reach it. Or a host could have 
> moved to a different port and SAVI having stale state - yet DAD should work. 
> My point is that the details of how the state is 1) maintained and 2) used to 
> forward packets are key to analyzing how the neighbor discovery functionality 
> and robustness would be affected by this idea.

The hosts that are "quiet" are indeed a problem. However, I think it might 
be very OS-specific - the today's smartphones are anything but quiet :-)
So I think this will be much more of an issue for some special-purpose 
devices, which are also predominantly servers.

I think the smart-meter type of device would be in this category. But to 
access those, the remote party will need to know their address... so, if 
a device does some sort of service registration upon boot-up, then it 
will not be a quiet device anymore.

I haven't seen this failure in the "consumer portable devices" networks 
that I ran - so this is purely a mental construct.

I do agree that there are limitations - but I think it is very important 
to collect the data about the scope of these limitations, to keep a 
balanced approach.

The data I have as of today based on 5-6 networks with 10-20K hosts says 
it is quite minimal. But I reserve the right to be wrong and thus would be 
very interested to hear about other live networks where it is different.

(side note: if these sleepy hosts are quiet and are concerned that 
their address be known, they might consider to use DHCPv6 as well).

>
> I section 4.4 the document refers to proxy but without specifying which of 
> the different proxy approaches you have in mind.
> There is the proxy ND RFC, and there is the DAD proxy internet-draft in 6man. 
> Are you referring to one of those, or some slightly different form of proxy? 
> (For example, ND proxy would respond with the LLA of the router/AP, but that 
> might result in host movement looking like a duplicate address - again 
> depends on the details.)

I don't know the spec. I saw it implemented in a product, where two 
attached clients A and B during the ND saw the packet sequences that 
were close enough to the ND spec to achieve identical practical result 
and be compliant to the letter of it, yet allowed to perform the process 
more optimally from the broader perspective.

Arguably this kind of behavior would not require its own spec ?

>
> I don't have any issue with removing the somewhat arbitrary 9000 second max 
> AdvDefaultLifetime in section 5.1. However, the tradeoff for what default 
> lifetime to use in section 4.5 needs to take into account one additional 
> factor.
> The default lifetime serves to garbage collect entries from the default 
> router list should a router silently disappear. Thus for links that do not 
> have a fixed (set of) link-local address(es) for the router(s), having a high 
> default lifetime means that after a failure the hosts would have one entry in 
> the default router list which us unreachable - until that high lifetime 
> expires. I don't know if there has been a study on the performance impact of 
> that would have on the hosts e.g., how often they would re-probe the default 
> router.

This is indeed a useful consideration that is worth noting in a doc 
that would be removing the 9000 seconds max limit. Mind if sketch a -00 on 
this matter and send you the possible text ?

>
> Section 4.6 has the same concern. But 4.5 and 4.6  makes lots of sense e.g., 
> in a VRRP deployment where the link-local address of the virtual default 
> router would always be the same. Ditto for networks with a single point of 
> failure single router at a fixed address (e.g., if the router is always at 
> fe80::1 or some other fixed address.) Thus I think we should recommend 4.5 
> and 4.6 that within that applicability. Added benefit is that the routers 
> control it, hence the operator of the network can set the values higher for 
> VRRP or single router cases.

I totally agree! We probably might have a series of documents for 
different types of use cases - this could be very helpful for the folks 
deploying them. We could turn this doc into a "large-scale high-density 
WiFi (stadiums, exhibitions, campuses, etc.)" and then have couple of 
others describing other types of deployments. Let's discuss this in London 
?

>
> Section 4.7 talks about limiting the table space on the hosts. But clearing 
> the on-link bit in the advertised prefixes also has the benefit of removing 
> any multicast NS sent from hosts for non-link-local destinations. And per RFC

Yes, it's an omission by implication. Very good thing to note though, will 
fix, thanks!

> 4861 the table will be filled in by the router sending redirects. Thus the 
> router can be used to control which addresses get in the tables on the hosts 
> by choose whether and when to send the redirects (redirects with 
> target==destination).

Absolutely, thanks for this mention, I forgot to write this.
Yes, this section is incomplete without mentioning that the routers have 
to have the redirects off, indeed.

>
> In think it would be good to separate out the second half of section 4.7 
> (blocking link-locals) into a separate section, since it is quite different 
> than clearing the on-link bit. That idea has significant implications since 
> it changes the IP subnet model (RFC 4903 talks about this.) I not saying that 
> we shouldn't consider this, but I do think it would fall in a very different 
> category than the other ideas in this draft. Might even be best to have a 
> separate draft on this radical idea so it can be explored fully.

I thought RFC5942 was clarifying this (by the way I need to reference it).

My thought process was as follows: suppose we have 9000 people, each with 
a smartphone, wandering inside a large building or on campus. We want to 
limit the broadcast domains. So we will split these 9000 hosts into 256
subnets, each with 35-36 hosts. Those are small enough to be close to what 
is a typical "small network" so everything will be fine, and the protocols 
which do use link-local addresses will actually be able to function.

Good ? Seems so, but, there are problems with this approach:

1) "Lobby ambassador" problem: everyone arrives through the same entrance, 
and turns on their portable device there. Therefore, the assignment of a 
host to the subnet must happen there, not based on physical location, but 
on some loadbalancing algorithm. I have 256 subnets, so e.g. for 
predictability we can use a first byte of a hash of the host's mac 
address.

2) I need to span those 256 vlans across the entire venue, and ensure I 
place each of the hosts into the correct VLAN on each AP. A typical 
network of such a scale will have about 300 APs. I still need to send RAs 
within each vlan. I need to ensure that the hosts do not receive the 
"wrong" RAs. (and remember, there is not really a way in a naive 802.11 
imlementation to do "selective" multicast).

3) I have people that are able to see each other using multicast 
advertisements, but what relation do they have to each other ? Equality of 
the first byte of the hash of their mac address. Not very useful.

So I conclude, that while splitting the whole crowd into smaller subnets 
somewhat takes care of the floods at the expense of additional complexity, 
the resulting "working" link-local multicast protocols do not really make 
any sense - because the partition of the network does not have any other 
logic than pure loadbalancing.

So, I decide to explore the dimension of optimizing the multicast-limiting 
properties, and increase the number of subnets.

Extrapolating, we get that having a subnet per host seems to be the most 
logical approach. (Actually there is even a shipping product that does 
exactly this - a per-host unicast-only RAs with different /64s).

But since I have only one /64, I decide that not-on-link+no-redirect 
should give me the same functionality within a much smaller address space.

So, rather than have a bunch of per-host /64s, with a router forwarding 
between them, I get a bunch of per-host /128s and get a much more compact 
solution - because now I need to send only *one* RA per AP, and even if it 
needs to be a multicast one, it is not a catastrophy.

Of course, I need to disclaim: doing this in the home network with one 
AP and < 100 devices all belonging to a single person is not useful at 
all.

Doing this in a network with 9000 devices belonging to different people is 
the thing that made the most sense.

Now that we've cut off all the services, we need to figure out how to 
bring them back... And I think the further mental construct here that 
would be useful is to consider this as a giant homenet with a lot of 
guests. Thus, I think from the services discovery standpoint, the products 
of the DNS-SD workgroup will be very beneficial here.


>
> 4.8 seems to conflate the address assignment with DAD. Just because we might 
> want to centralize the DAD checks doesn't imply that we want to remove the 
> ability for the host to pick its own privacy enhanced interface-IDs to form 
> its addresses.
> From a deployment perspective DHCPv6 is available for address assignment, but 
> don't think we want to require that for WiFi or other links which have packet 
> loss.

This is more of a fallback scenario for those who want a 100% guarantee 
of address uniqueness in the network - using the existing mechanism.

Also as a trigger for another small discussion:

I think it's worth modeling the real-world experience in the networks with 
varying packet loss to see at which point it stops being usable, and go 
from there.

The classic IETF "Built-in hotel WiFi" topic: is it worth being extra-sure 
that DAD works in that scenario ? Or is it worth fixing the connectivity 
by bringing the loss down to acceptable level ? At how many retransmits do 
we declare a failure ? Could we then explore a similar approach to 
make the DAD more robust instead - keeping in mind the probability of 
failure should be quite low ?

This will be a good robustness improvement for the hosts that will 
immediately benefit them.

> (Packet loss occurs on wired networks as well, but the drop 
> distribution is different - might happen during spanning tree reconvergence 
> etc.)
> Note that DHCPv6 (RFC 3315) has a SHOULD for doing DAD on the addresses 
> received from the DHCP server - needed since the server could be confused.
>

yeah.

> In don't understand 4.9. Should I read it as a host shutting things down if 
> it goes to sleep even for a short time, and then waking up and multicasting a

depends on the definition of "short". Some hosts nap during the 100ms 
interval between the 802.11 beacons. Obviously would not work here.
Maybe not do it within 5 seconds of turning the screen off.
But I certainly know my gadgets stop being reachable if I do not touch 
them for a day"

> few RSs to get RAs, then multicasting a DAD probe for each assigned IPv6 
> address? If all the hosts did that then I think there would be more 
> multicasts.
> Note that even if the host doesn't revert to multicasting RSs, it still needs 
> to be concerned about a duplicate address having arrived on the link while it 
> was asleep (and not responding to any DAD probes.)
>
> The suggestion in section 5.2 is needs a lot more work. I tried to work out 
> some of these issues in section 8.9 in 
> draft-chakrabarti-nordmark-6man-efficient-nd-05.
> But even if we figure out how to do that without causing lots of unneeded RS 
> load on the routers, in the case of your draft wouldn't the router(s) still 
> need to multicast RAs at the same frequency? The router(s) have no way to 
> know whether all the hosts on the link initiate RS to refresh RA information.

Sure it needs more whiteboarding. But if we assume one MAC address = one 
stack (this is a fair assumption I think?), it should be possible, here is 
a raw idea:

By the destination of the RS being unicast, the router knows the host is 
"resolicit-capable". The source address contains the MAC address, so it is 
sufficient information to create a neighbor entry with a flag 
"resolicit-capable" in it.

Subsequent neighbor entries with the same MAC address will inherit the 
"resolicit-capable" in it.

(Data structure detour: don't have to index by MAC address to do this.
Take two counting bloom filters "legacy hosts' MAC addresses" and 
"resolicit-capable hosts' MAC addresses". In case of a false positive 
match, prefer the membership in the "legacy hosts" to avoid blackout).

Keep the running counters of number of "resolicit-capable" ND entries and 
"legacy" ND entries.

When it's time to send a periodic RA, look at the counters, and if the 
count of "legacy" hosts is zero, no need to send it.

This algorithm gives multiple deployment choices:

1) "legacy" mode:

set the RA interval at 1/4 of the router lifetime. This way none of the 
hosts ever reaches the half remaining time, resolicits will never be sent, 
business as usual same as today.

2) "resolicit-capable" mode:

set the RA interval at 3/4 of the router lifetime. Then by this time we 
will have seen the RS from all the capable nodes, so we can decide whether
to send a multicast RA or not.

This of course needs to be worked on in order to avoid the synchronization 
issues (i.e. the host can not just blast an RS straight at half lifetime). 
But they are solvable: Assuming even the existing router lifetime limit 
of 2.5 hours, and RA interval of 1.5 hours, we have a space +- 0.5 hour of 
jitter.

Taking an arbitrary guess of 180000 hosts within that /64, a uniformly 
distributed jitter will give ~100pps of RSs which is more than manageable.

NB: this is a very first napkin sketch, so it is fairly simplistic. But I 
think if we were to tinker a bit more with the timers, this can be made to 
work.

There is also a question - what happen if the router has to clean up the 
legacy host ND entry due to resource constraints - this can be taken care 
of by temporarily going into "legacy only" mode for a period of a couple 
of RA lifetimes.

Of course, it requires the sleepy hosts to wake up every now and then. 
But with a lifted router lifetime limit, it will be every 9 hours.

Probably doable, but let's take a SWAG just to sanity check:
http://www.ti.com/lit/ds/symlink/cc3000.pdf and suppose they were one day 
to implement IPv6 with the similar power consumption.

I count the 802.11g, because of more efficient spectrum usage. This gives 
us maximum power consumption of 207ma, and a shutdown current of 5ua.

Now, assuming we couple it with something like 
http://www.atmel.com/Images/Atmel-2586-AVR-8-bit-Microcontroller-ATtiny25-ATtiny45-ATtiny85_Datasheet.pdf
which has standby of 0.1ua, and 12ma of active consumption (graph at page 
173 of this pdf)

Let's say 20 seconds should be enough to make all the necessary 802.11 
arrangements, send the data, send the solicitation, and receive 
advertisement.

Now, assuming we power it off the typical NiMh 1.2V elements with 1800mAh 
capacity, the continous running time off it will be 1800*3600/220 = 29454 
seconds.

Let's assume for simplicity that we wake up every 8 hours, so this gives 
60 seconds per day. This gives us the active life span of 490 days.

Given that even the low self-discharge batteries 
(http://en.wikipedia.org/wiki/Nickel%E2%80%93metal_hydride_battery#Low_self-discharge_cells) 
have retain rate of 70%-85% within a year, probably this is a reasonable 
lifecycle (also for this reason I am not accounting for the standby 
current, it's comparable to self-discharge).

This is of course also a quick napkin sketch, a properly engineered 
approach would take into account the 802.11 maintenance stuff - and, also 
it's really the "extreme" case which does not listen to the packets while 
asleep - so the lifetime would certainly vary - but I think this shows 
that this is not a totally unreasonable path.

--a

>
> Regards,
>    Erik
>
>
>