Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast

Erik Nordmark <nordmark@acm.org> Tue, 25 February 2014 11:59 UTC

Return-Path: <nordmark@acm.org>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A9D251A06B4 for <ipv6@ietfa.amsl.com>; Tue, 25 Feb 2014 03:59:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.535
X-Spam-Level:
X-Spam-Status: No, score=-0.535 tagged_above=-999 required=5 tests=[BAYES_50=0.8, GB_I_LETTER=-2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.665] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nVIIWXOLTKYv for <ipv6@ietfa.amsl.com>; Tue, 25 Feb 2014 03:59:44 -0800 (PST)
Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80]) by ietfa.amsl.com (Postfix) with ESMTP id 57E171A0447 for <ipv6@ietf.org>; Tue, 25 Feb 2014 03:59:44 -0800 (PST)
Received: from [192.168.10.18] ([78.204.24.4]) (authenticated bits=0) by c.mail.sonic.net (8.14.4/8.14.4) with ESMTP id s1PBxYOA015729 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Tue, 25 Feb 2014 03:59:36 -0800
Message-ID: <530C85A6.5080404@acm.org>
Date: Tue, 25 Feb 2014 03:59:34 -0800
From: Erik Nordmark <nordmark@acm.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: Andrew Yourtchenko <ayourtch@cisco.com>
Subject: Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast
References: <5305AF13.5060201@acm.org> <alpine.OSX.2.00.1402201404091.12073@ayourtch-mac>
In-Reply-To: <alpine.OSX.2.00.1402201404091.12073@ayourtch-mac>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Sonic-ID: C;bAQRSxSe4xGWMOgyCY+HFQ== M;Zi6wSxSe4xGWMOgyCY+HFQ==
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/CFxfAoSWbktKwHYgSmIRqNVxcWE
Cc: IETF IPv6 <ipv6@ietf.org>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Feb 2014 11:59:47 -0000

On 2/20/14 8:56 AM, Andrew Yourtchenko wrote:
>
> Yeah, it's a very useful optimization to consider, we discussed it I 
> think alread. The reason I did not include it yet, was to go over the 
> tradeoffs without the pressure of the submission deadline being a few 
> hours away :)
>
> 1) from the efficiency standpoint this approach is fantastic for the 
> wired and for the wireless (the implementations that do the 
> mcast->ucast conversion over the air the win is a bit less, but 
> nonetheless there is a win since we send only one packet on the wired 
> side instead of many and save the CPU on the gateway).
>
> 2) This adds an instantaneous behavior change into the network at the 
> peak load conditions - so if that code path has a problem, this 
> creates hard-to-debug situation.
>
> 3) There are a few trickier scenarios with L3 roaming (hosts arriving 
> from other subnets onto the same AP) and that AP having a single 
> 802.11 group encryption key, which this behavior might make easier to 
> accidentally have broken.
I don't quite understand concern #3 (and #2). Since the routers doesn't 
know whether hosts rely on unsolicited multicast RAs, the multicast RAs 
must function. Thus adding the flash crowd optimization to multicast 
solicited RAs wouldn't be anything new or different - it would merely 
appear in one additional case.
>>
>> In section 4.3 there are two suggestions: MLD snooping and L2 unicast 
>> or L3 multicast packets. I think there are some operational concerns 
>> around MLD snooping - I hope others can fill in some information in 
>> that space since I don't know the issues.
>> (Editorially it would be clearer if those two are separate 
>> sub-sections.)
>>
>> The multicast-over-unicast refers to SAVI as a way to collect the 
>> state, but it doesn't specify where you see that state used. 
>> Presumably for DAD (and NS in general unless you also do section 
>> 4.7)? But SAVI doesn't claim to have all addresses since it is 
>> concerned with conflicts. Thus a host can exist and be silent - but 
>> DAD would still expect to reach it. Or a host could have moved to a 
>> different port and SAVI having stale state - yet DAD should work. My 
>> point is that the details of how the state is 1) maintained and 2) 
>> used to forward packets are key to analyzing how the neighbor 
>> discovery functionality and robustness would be affected by this idea.
>
> The hosts that are "quiet" are indeed a problem. However, I think it 
> might be very OS-specific - the today's smartphones are anything but 
> quiet :-)
> So I think this will be much more of an issue for some special-purpose 
> devices, which are also predominantly servers.
I guess my first question was how this would work when there is no 
SAVI-like state for an address. (In that case you don't have a unicast 
address to which you can send the packet.) Would you multicast or drop?
> I think the smart-meter type of device would be in this category. But 
> to access those, the remote party will need to know their address... 
> so, if a device does some sort of service registration upon boot-up, 
> then it will not be a quiet device anymore.
But the service registration could have happened when the device was 
first installed - I don't think that is likely to happen each time the 
device wakes up.
> I haven't seen this failure in the "consumer portable devices" 
> networks that I ran - so this is purely a mental construct.
Yes, but IPv6 needs to work for other sleepy devices. We want whatever 
improvements we do to the standards in this space to be useful for the 
next 20 years or so. (Products, whether end devices or routers/switches 
can operate on shorter time scales and with more limited applicability 
than the core standards.)

>
> I do agree that there are limitations - but I think it is very 
> important to collect the data about the scope of these limitations, to 
> keep a balanced approach.
>
> The data I have as of today based on 5-6 networks with 10-20K hosts 
> says it is quite minimal. But I reserve the right to be wrong and thus 
> would be very interested to hear about other live networks where it is 
> different.
>
> (side note: if these sleepy hosts are quiet and are concerned that 
> their address be known, they might consider to use DHCPv6 as well).
Using DHCPv6 doesn't reduce multicast (neither for address resolution 
nor for dad). One could change DHCP to record the link-layer addresses 
so that the routers would ask the DHCP servers for the link-layer 
addresses, but that seems like a lot of change.

Also, see my response to Ole on the other ways that the DHCPv6 we 
currently have doesn't help solve the problems at hand. [I see others 
brought up the same concerns around DHCPv6 on the list.]

>
>>
>> I section 4.4 the document refers to proxy but without specifying 
>> which of the different proxy approaches you have in mind.
>> There is the proxy ND RFC, and there is the DAD proxy internet-draft 
>> in 6man. Are you referring to one of those, or some slightly 
>> different form of proxy? (For example, ND proxy would respond with 
>> the LLA of the router/AP, but that might result in host movement 
>> looking like a duplicate address - again depends on the details.)
>
> I don't know the spec. I saw it implemented in a product, where two 
> attached clients A and B during the ND saw the packet sequences that 
> were close enough to the ND spec to achieve identical practical result 
> and be compliant to the letter of it, yet allowed to perform the 
> process more optimally from the broader perspective.
>
> Arguably this kind of behavior would not require its own spec ?
Yes, because it depending on the details it might break SeND and other 
ND options, or it might only be robust and useful in very limited 
applicability for instance in specific topologies (such as a single 
wireless controller).

>> I don't have any issue with removing the somewhat arbitrary 9000 
>> second max AdvDefaultLifetime in section 5.1. However, the tradeoff 
>> for what default lifetime to use in section 4.5 needs to take into 
>> account one additional factor.
>> The default lifetime serves to garbage collect entries from the 
>> default router list should a router silently disappear. Thus for 
>> links that do not have a fixed (set of) link-local address(es) for 
>> the router(s), having a high default lifetime means that after a 
>> failure the hosts would have one entry in the default router list 
>> which us unreachable - until that high lifetime expires. I don't know 
>> if there has been a study on the performance impact of that would 
>> have on the hosts e.g., how often they would re-probe the default 
>> router.
>
> This is indeed a useful consideration that is worth noting in a doc 
> that would be removing the 9000 seconds max limit. Mind if sketch a 
> -00 on this matter and send you the possible text ?
That would be excellent.

>> Section 4.6 has the same concern. But 4.5 and 4.6  makes lots of 
>> sense e.g., in a VRRP deployment where the link-local address of the 
>> virtual default router would always be the same. Ditto for networks 
>> with a single point of failure single router at a fixed address 
>> (e.g., if the router is always at fe80::1 or some other fixed 
>> address.) Thus I think we should recommend 4.5 and 4.6 that within 
>> that applicability. Added benefit is that the routers control it, 
>> hence the operator of the network can set the values higher for VRRP 
>> or single router cases.
>
> I totally agree! We probably might have a series of documents for 
> different types of use cases - this could be very helpful for the 
> folks deploying them. We could turn this doc into a "large-scale 
> high-density WiFi (stadiums, exhibitions, campuses, etc.)" and then 
> have couple of others describing other types of deployments. Let's 
> discuss this in London ?
Yep.
>> In think it would be good to separate out the second half of section 
>> 4.7 (blocking link-locals) into a separate section, since it is quite 
>> different than clearing the on-link bit. That idea has significant 
>> implications since it changes the IP subnet model (RFC 4903 talks 
>> about this.) I not saying that we shouldn't consider this, but I do 
>> think it would fall in a very different category than the other ideas 
>> in this draft. Might even be best to have a separate draft on this 
>> radical idea so it can be explored fully.
>
> I thought RFC5942 was clarifying this (by the way I need to reference 
> it).
5942 re-states the intent of 4861 with more detail. But doesn't change 
the above semantics. (It does remove these two bullets from on-link:
        *  a Neighbor Advertisement (NA) message is received for the
           (target) address, or
        *  any Neighbor Discovery message is received from the address.
)

The only case I know of where "blocking link-locals" is part of our 
standards is for multi-link subnets, for instance in 6lowpan. But in 
that case it was part of the architecture from the start. The issue we 
have with the general ND evolution is that we need to support the 
existing assumptions, and many of those assumptions come from IPv4. For 
instance, the use and semantics of an IPv4 link-local multicast has been 
carried to IPv6 link-locals.

Removing that will cause some breakage.
>
> My thought process was as follows: suppose we have 9000 people, each 
> with a smartphone, wandering inside a large building or on campus. We 
> want to limit the broadcast domains. So we will split these 9000 hosts 
> into 256
> subnets, each with 35-36 hosts. Those are small enough to be close to 
> what is a typical "small network" so everything will be fine, and the 
> protocols which do use link-local addresses will actually be able to 
> function.
>
> Good ? Seems so, but, there are problems with this approach:
>
> 1) "Lobby ambassador" problem: everyone arrives through the same 
> entrance, and turns on their portable device there. Therefore, the 
> assignment of a host to the subnet must happen there, not based on 
> physical location, but on some loadbalancing algorithm. I have 256 
> subnets, so e.g. for predictability we can use a first byte of a hash 
> of the host's mac address.
>
> 2) I need to span those 256 vlans across the entire venue, and ensure 
> I place each of the hosts into the correct VLAN on each AP. A typical 
> network of such a scale will have about 300 APs. I still need to send 
> RAs within each vlan. I need to ensure that the hosts do not receive 
> the "wrong" RAs. (and remember, there is not really a way in a naive 
> 802.11 imlementation to do "selective" multicast).
>
> 3) I have people that are able to see each other using multicast 
> advertisements, but what relation do they have to each other ? 
> Equality of the first byte of the hash of their mac address. Not very 
> useful.
>
> So I conclude, that while splitting the whole crowd into smaller 
> subnets somewhat takes care of the floods at the expense of additional 
> complexity, the resulting "working" link-local multicast protocols do 
> not really make any sense - because the partition of the network does 
> not have any other logic than pure loadbalancing.
I agree with all that.

For those types of deployments having /127 subnets (plus DHCPv6 PD if 
someone wants a personal area network behind their phone) makes a lot of 
sense.

But my issue is that the goals we are trying to achieve are more general 
than the stadium (or broadband subscriber) use case.

> Of course, I need to disclaim: doing this in the home network with one 
> AP and < 100 devices all belonging to a single person is not useful at 
> all.
>
> Doing this in a network with 9000 devices belonging to different 
> people is the thing that made the most sense.
Agreed.
>
> Now that we've cut off all the services, we need to figure out how to 
> bring them back... And I think the further mental construct here that 
> would be useful is to consider this as a giant homenet with a lot of 
> guests. Thus, I think from the services discovery standpoint, the 
> products of the DNS-SD workgroup will be very beneficial here.
I don't think such an approach (first break it, then get folks to fix 
whatever broke whether that means fixing an IETF standard, some common 
implementation, or various to us unknown and proprietary approaches 
makes sense for a retrofit.

Only makes sense for a greenfield deployment where there are no 
assumptions that existing protocols and code continue to work.
>
>>
>> 4.8 seems to conflate the address assignment with DAD. Just because 
>> we might want to centralize the DAD checks doesn't imply that we want 
>> to remove the ability for the host to pick its own privacy enhanced 
>> interface-IDs to form its addresses.
>> From a deployment perspective DHCPv6 is available for address 
>> assignment, but don't think we want to require that for WiFi or other 
>> links which have packet loss.
>
> This is more of a fallback scenario for those who want a 100% 
> guarantee of address uniqueness in the network - using the existing 
> mechanism.
But RFC 3315 doesn't guarantee uniqueness, which is why the host needs 
to do perform DAD in addition.
>
> Also as a trigger for another small discussion:
>
> I think it's worth modeling the real-world experience in the networks 
> with varying packet loss to see at which point it stops being usable, 
> and go from there.
>
> The classic IETF "Built-in hotel WiFi" topic: is it worth being 
> extra-sure that DAD works in that scenario ? Or is it worth fixing the 
> connectivity by bringing the loss down to acceptable level ? At how 
> many retransmits do we declare a failure ? Could we then explore a 
> similar approach to make the DAD more robust instead - keeping in mind 
> the probability of failure should be quite low ?
>
> This will be a good robustness improvement for the hosts that will 
> immediately benefit them.
If we want a DAD probe mechanism that is more robust to failure then I 
think we should just reuse the techniques in ACD (RFC 5227) which 
performs ongoing conflict detection. However, that results in or 
broadcast/multicast messages!

If we want both less multicast packets *and* a more robust DAD 
(including efficiently handling DAD for sleeping nodes), then I don't 
see any approach other than making some devices on the link be able to 
speak more authoritatively about the addresses present on the link. 
Those devices can try to build that state implicitly (gleaning from 
packets on the link), or explicitly. (I'll try to capture the 
differences between those before next week.)
>
>> In don't understand 4.9. Should I read it as a host shutting things 
>> down if it goes to sleep even for a short time, and then waking up 
>> and multicasting a
>
> depends on the definition of "short". Some hosts nap during the 100ms 
> interval between the 802.11 beacons. Obviously would not work here.
> Maybe not do it within 5 seconds of turning the screen off.
> But I certainly know my gadgets stop being reachable if I do not touch 
> them for a day"
The loss of reachability is expected.
But there is also a question about the efficiency when the device wakes 
up. Whether it needs to start from scratch (multicast RS, do DAD 
multicasts and wait for a second), or whether it can do DNA (unicast a 
NS to the router(s) to check it is on the same link) and avoid waiting 
for the lack of a response to a DAD probe.
>
> Sure it needs more whiteboarding. But if we assume one MAC address = 
> one stack (this is a fair assumption I think?), it should be possible, 
> here is a raw idea:
Need to bring a whiteboard to London ... ;-)
>
> By the destination of the RS being unicast, the router knows the host 
> is "resolicit-capable". The source address contains the MAC address, 
> so it is sufficient information to create a neighbor entry with a flag 
> "resolicit-capable" in it.
Or add a flag to the RS to say "I will re-solicit" ...
>
> Subsequent neighbor entries with the same MAC address will inherit the 
> "resolicit-capable" in it.
>
> (Data structure detour: don't have to index by MAC address to do this.
> Take two counting bloom filters "legacy hosts' MAC addresses" and 
> "resolicit-capable hosts' MAC addresses". In case of a false positive 
> match, prefer the membership in the "legacy hosts" to avoid blackout).
>
> Keep the running counters of number of "resolicit-capable" ND entries 
> and "legacy" ND entries.
I can see how you increment those counters (based on seeing a RS with a 
new MAC address.)
But when do you decrement them?

(I don't think NUD from the router can be used for this, because the 
host can be unreachable from the router for a few seconds due to radio 
issues, yet the host will still consider itself connected and not 
restart by sending an RS.)
>
> When it's time to send a periodic RA, look at the counters, and if the 
> count of "legacy" hosts is zero, no need to send it.
>
> This algorithm gives multiple deployment choices:
>
> 1) "legacy" mode:
>
> set the RA interval at 1/4 of the router lifetime. This way none of 
> the hosts ever reaches the half remaining time, resolicits will never 
> be sent, business as usual same as today.
>
> 2) "resolicit-capable" mode:
>
> set the RA interval at 3/4 of the router lifetime. Then by this time 
> we will have seen the RS from all the capable nodes, so we can decide 
> whether
> to send a multicast RA or not.
Presumably the RSs can be lost. One way to handle this is by agressive 
retransmissions (no RA after 1 second, then resend RS). Another way is 
to space the "retransmissions" inside the lifetime range e.g., by having 
the host send a RS when 1/3 of the router lifetime has passed, next when 
2/3 has passed. (Generalize to k/N for N transmissions before giving up.)
>
> This of course needs to be worked on in order to avoid the 
> synchronization issues (i.e. the host can not just blast an RS 
> straight at half lifetime). But they are solvable: Assuming even the 
> existing router lifetime limit of 2.5 hours, and RA interval of 1.5 
> hours, we have a space +- 0.5 hour of jitter.
>
> Taking an arbitrary guess of 180000 hosts within that /64, a uniformly 
> distributed jitter will give ~100pps of RSs which is more than 
> manageable.
Your suggestions made me realize the stuff I put in section 8.9 in 
efficient-nd-05 is more complex than it needs be. No need to worry about 
prefix and other lifetimes - sufficient to look at the default router 
lifetime and make sure (by sending RSs) that the host hears from all the 
routers. That's good.
>
> NB: this is a very first napkin sketch, so it is fairly simplistic. 
> But I think if we were to tinker a bit more with the timers, this can 
> be made to work.
>
> There is also a question - what happen if the router has to clean up 
> the legacy host ND entry due to resource constraints - this can be 
> taken care of by temporarily going into "legacy only" mode for a 
> period of a couple of RA lifetimes.
The router is free to just discard the NCEs - unless we change DAD to 
use them in some proxy approach.
Even if we change DAD that way, the router can just do the SAVI thing 
(send a unicast NS to the host and see if it still there) - no need to 
multicast RAs to clean up.
However, the unicast NS cleanup (or any other cleanup driven by the 
router expecting the host to respond) has issues with sleeping hosts. (I 
just realize their might be different forms of sleep - completely off 
and will take up based on a timer, or being woken up by arriving 
(unicast) packets. I don't know if there is confusion around that and 
whether we need different terms to make it more clear.)
>
> Of course, it requires the sleepy hosts to wake up every now and then. 
> But with a lifted router lifetime limit, it will be every 9 hours.
Assuming DAD is handled without involving the hosts.
>
> Probably doable, but let's take a SWAG just to sanity check:
> http://www.ti.com/lit/ds/symlink/cc3000.pdf and suppose they were one 
> day to implement IPv6 with the similar power consumption.
>
> I count the 802.11g, because of more efficient spectrum usage. This 
> gives us maximum power consumption of 207ma, and a shutdown current of 
> 5ua.
>
> Now, assuming we couple it with something like 
> http://www.atmel.com/Images/Atmel-2586-AVR-8-bit-Microcontroller-ATtiny25-ATtiny45-ATtiny85_Datasheet.pdf
> which has standby of 0.1ua, and 12ma of active consumption (graph at 
> page 173 of this pdf)
>
> Let's say 20 seconds should be enough to make all the necessary 802.11 
> arrangements, send the data, send the solicitation, and receive 
> advertisement.
That leaves plenty of time to do the current DAD.

However, Margaret had some numbers a while back with lots more frequent 
wakeups, but with very short runtime. I don't know if that was captured 
in RFC 6574 or in Margarets slides from the workshop. In any case, one 
issue I remember is that for Ipv4 the runtime was a lot less than one 
second, but with IPv6 there was an additional 1 second to wait for DAD 
to complete which blew the power budget.

    Erik
>
> Now, assuming we power it off the typical NiMh 1.2V elements with 
> 1800mAh capacity, the continous running time off it will be 
> 1800*3600/220 = 29454 seconds.
>
> Let's assume for simplicity that we wake up every 8 hours, so this 
> gives 60 seconds per day. This gives us the active life span of 490 days.
>
> Given that even the low self-discharge batteries 
> (http://en.wikipedia.org/wiki/Nickel%E2%80%93metal_hydride_battery#Low_self-discharge_cells) 
> have retain rate of 70%-85% within a year, probably this is a 
> reasonable lifecycle (also for this reason I am not accounting for the 
> standby current, it's comparable to self-discharge).
>
> This is of course also a quick napkin sketch, a properly engineered 
> approach would take into account the 802.11 maintenance stuff - and, 
> also it's really the "extreme" case which does not listen to the 
> packets while asleep - so the lifetime would certainly vary - but I 
> think this shows that this is not a totally unreasonable path.
>
> --a
>
>>
>> Regards,
>>    Erik
>>
>>
>>
>