Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast

Andrew Yourtchenko <ayourtch@cisco.com> Fri, 28 February 2014 15:27 UTC

Date: Fri, 28 Feb 2014 16:26:35 +0100
From: Andrew Yourtchenko <ayourtch@cisco.com>
To: Erik Nordmark <nordmark@acm.org>
Subject: Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast
In-Reply-To: <530C9CFD.2000409@acm.org>
Message-ID: <alpine.OSX.2.00.1402272036140.42594@ayourtch-mac>
References: <5305AF13.5060201@acm.org> <75B6FA9F576969419E42BECB86CB1B89115F99A9@xmb-rcd-x06.cisco.com> <alpine.OSX.2.00.1402211620560.49053@ayourtch-mac> <75B6FA9F576969419E42BECB86CB1B89115F9BAE@xmb-rcd-x06.cisco.com> <alpine.OSX.2.00.1402212129450.52880@ayourtch-mac> <530C9CFD.2000409@acm.org>
User-Agent: Alpine 2.00 (OSX 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/vDMfc5XQFgY0D3yYf2o38nH9xHA
Cc: IETF IPv6 <ipv6@ietf.org>
Precedence: list

Erik,

On Tue, 25 Feb 2014, Erik Nordmark wrote:

> On 2/21/14 12:36 PM, Andrew Yourtchenko wrote:
>> 
>> Myself I did not like this portion of my reply with no quantifiable data 
>> (because operating on a basis of a belief is not a good engineering).
>> 
>> I happened to have a couple of hours of offline time, during which I tried 
>> to sketch the scenario and try to get as far as I can without building a 
>> lab.
>> 
>> It's a *very rough sketch*. If you think there can be parts that can be 
>> made better in it, tell me.
> Andrew,
> I like this sketch. A few comments below.
>> 
>> The initial assumptions for this thought experiment are as follows:
>> 
>> 1) We have 10000 clients in a single /64.
>> 
>> 2) There are multiple APs that bridge the traffic from wired onto
>> wireless medium, with the client count limited to 100 per AP.
>> 
>> 3) there is 20x speed difference between unicast transmission and
>> multicast transmission.: the effective multicast speed is assumed to be
>> 1 mbps, the effective unicast speed is assumed to be 20mbps data rate.
>> 
>> 4) The APs are assumed to be "Naive" i.e. they do not perform any snooping
>> nor multicast->unicast conversions, but at the same time they are able to 
>> bridge the unicast traffic without flooding it between multiple access 
>> points. I.e. we assume a model where we have a single router (or an FHRP 
>> pair) and a set of 100 APs bridging the traffic.
>> 
>> (Corollary from the above: The effective unicast capacity is 100*20mbps, 
>> whereas the effective multicast capacity is 1*1Mbps, therefore the 
>> difference in throughput is 1000-fold).
>> 
>> Let's first consider the steady state. Suppose each host downloads
>> a file at 0.1 mbps.  Within each AP, therefore we have a 50% capacity 
>> utilization (0.1*100 = 10Mbps, we have 20 Mbps capacity).
>> 
>> It's easy to see this comfortably accomodates all the hosts. Obviuously
>> the unicast NUD in this traffic is fairly minimal, so I don't think it's 
>> even worth to count how much it is.
>> 
>> Now, lets look at a potential failure.
>> 
>> At the time of the NUD probe, it's enough to lose the 3 retries,
>> spaced 1 second apart.  The default reachable time is 30 seconds,
>> with the random jitter of 0.5 of that.
>> 
>> So, all that is needed to achieve a mass NUD failure is a ~30 second
>> outage during the period when all the hosts are sending the traffic.
>> 
>> A reboot of the majority of the networking gear takes several times longer 
>> than this.
>> 
>> Therefore, a crash of the default gateway during the peak hour is
>> a one guaranteed trigger for this to happen.
>> 
>> So, now we have a situation of 10000 hosts which have deleted their default 
>> gateway from the neighbor table, and send the multicast neighbor 
>> solicitations for it.
>> 
>> Assume the NS is 64 bytes, 10000 hosts sending such a packet means 64 bytes 
>> * 8 bit * 10000 hosts = 5120000 bits/sec - or, 5 mbps.

> Just to make it clear, each NS retransmissions is 5 Mbits and the 
> retransmission time is 1 second. Hence with 4861 we get 5 mbps for 3 seconds. 
> However, they hosts aren't that likely to all discover the NUD failure in the 
> same 3 second window. With ReachableTime=30 seconds it depends on when there 
> last successful NUD probe or higher-level reachability advise can in. Hmm - 
> with TCP the reachability advise might come with every left window edge 
> advancement i.e. they would synchronize quite closely.

Yes, it's a bit of a fringe assumption here of all the hosts being active 
and then losing the connectivity, and somewhat uniform implementations of 
TCP (or, stuff like name resolutions). In reality I think they will be 
spaced more.

>
>> 
>> Note, that this is only the shared bandwidth downstream back to the hosts - 
>> the airtime spent by the upstream traffic is 20x faster, so I am 
>> gratuitously discarding it.
>> 
>> Since the APs can not send the traffic at this rate, obviously, we will
>> need to drop some of it. Note, if the clients succeed with the ND to the 
>> default gateway, they will start streaming data again, so the effective 
>> multicast throughput will drop to the 0.5 mbps as soon as the noticeable 
>> portion of clients recovers.
> While the multicast NS will see downstream drops, what matters for the 
> resolution is that the router receives it and responds. Thus why wouldn't 
> that part work?
> The NA from the router will be unicast so if that unicast isn't dropped due 
> to the multicasts then it should resolve quickly. I don't know if unicast is 
> prioritized higher than multicast or not.
>
> (We'll see downstream overload due to the multicasts, but that might just 
> last for 1 second if the NS/NA to/from the router gets through.)

Yes, since the router is on the wired, the 
bandwidth degradation due to a "dummy" multicast retransmission is the 
biggest problem. The NS will go unicast over the air, then go to the wired 
+ multicast over the air, and the unicast reply from the default 
gateway on the wired will go over unicast.

>
>> 
>> Let's assume the best case of 20% of the clients managing to recover within 
>> the first second.
>> 
>> As we are approaching full recovery, the lesser number of clients will be 
>> able to get their multicast NS sent because the airtime is being taken by 
>> the payload traffic. Anyway, let's discard that and assume that every 
>> second 20% of the clients will recover.
>> 
>> This means that the recovery of the full set of the clients in this 
>> conditions will take an *absolute* minimum of 5 seconds.
>> 
>> Did I prove myself wrong ? Seems so. We can see that with some of the 
>> relaxed assumptions I took, the hosts seem to recover.
>> 
>> But, let's add to this a little bit of mDNS, and other multicast-loving 
>> protocols, which tend to generate a fair chunk of traffic when they detect 
>> the network was "restored".
>> 
>> This can shrink the available capacity several times. Add to this that the 
>> host does not necessarily stream at 100kbps, but might have higher data 
>> rates, and I think we can consider the available
>> multicast capacity at startup to be 1/10th of what it is in theory.
> Yes, in general TCP will fill the pipe thus at each AP all of the downstream 
> bandwidth will be used.
>
> But if the streaming comes via the default router (which crashed or was 
> unreachable for a few seconds), then the stream would also slow down due to 
> TCP, in which case there is less of a multicast overload issue. There might 
> be cases where this TCP slowdown doesn't happen though.

Yes, indeed. Also, given the high utilisation of the media at that point, 
I wonder if the cross-channel interference will play a bigger role, 
therefore affecting the results.

This seems like an extremely tricky beast to model with any reasonable 
accuracy!

--a



>
>  Erik
>
>> 
>> This is where the things may become interesting - 1/10th of the capacity 
>> means that it will take not 5, but 50 seconds for all the hosts to recover.
>> 
>> This means that the hosts which recovered in the very first second, will 
>> already be sending NUD traffic while the network is still under stress. If 
>> these packets are lost, the hosts might back into the pool of the "orphans" 
>> who are sending the multicast, because they delete their neighbor entry.
>> 
>> There are some other things to consider:
>> 
>> - I deliberately kept the scenario here with *one* ND entry.
>> Assuming your hosts are talking with 3-4 other hosts besides the default 
>> gateway. This increases the load and proportionally makes the dangerous 
>> state easier to achieve.
>> 
>> - another factor that I am omitting - that such a storm of ND onto the 
>> default gateway might cause rate-limiting of the control plane packets on 
>> the gateway. With some of the limits being as low as 1000pps, this might 
>> give a recovery time on the order of minutes even without the wireless 
>> multicast being the bottleneck, yet still resulting in a lot of multicast 
>> NS in the air still during the slow recovery.
>> 
>> This is about as precise of the construct I can build.
>> 
>> Is it perfect ? No, by no means. It also does assume that in the case of 
>> the default gateway the wireless performance will be limiting factor - I 
>> think it won't - so it's more of an appropriate scenario for a case of a 
>> p2p communications on the network. The default-gateway-only will be 
>> inherently much more stable, I think - because the multicast on the wired 
>> side is fast, so the drops on the wireless side will not matter.
>> 
>> It's probably worth it to change the text into "There is potential for the 
>> failing NUD to *contribute* to a longer recovery and possible creation of 
>> the locked situation in the case of flash failure - but the exact 
>> quantification of the impact in such an environment is a topic of further 
>> study".
>> 
>> And then maybe I could dump the above thought experiment into a separate 
>> draft, to see if the folks could contribute to the experiment / maybe 
>> someone could run it - and reference it in this item ?
>> 
>> It seems like an interesting area to dig a bit more in - creating a 
>> suitable model and playing with the parameters to see where it breaks seems 
>> like a useful exercise to understand how many hosts can there be in a 
>> single /64 on WiFi with a "naive" set of access-points.
>> 
>> Thoughts ?
>> 
>> --a
>> 
>

Comments on draft-yourtchenko-colitti-nd-reduce-m… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Comments on draft-yourtchenko-colitti-nd-redu… Andrew Yourtchenko
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Eric Levy- Abegnoli (elevyabe)
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
RE: Comments on draft-yourtchenko-colitti-nd-redu… Andrew Yourtchenko
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Comments on draft-yourtchenko-colitti-nd-redu… Eric Levy- Abegnoli (elevyabe)
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
RE: Comments on draft-yourtchenko-colitti-nd-redu… Andrew Yourtchenko
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Comments on draft-yourtchenko-colitti-nd-redu… Mark ZZZ Smith
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Pascal Thubert (pthubert)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Mark ZZZ Smith
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Re: Comments on draft-yourtchenko-colitti-nd-… Ray Hunter
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
Re: Comments on draft-yourtchenko-colitti-nd-redu… Pascal Thubert (pthubert)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Pascal Thubert (pthubert)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ole Troan
RE: Comments on draft-yourtchenko-colitti-nd-redu… Hemant Singh (shemant)
Re: Comments on draft-yourtchenko-colitti-nd-redu… Ralph Droms
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Mark ZZZ Smith
Re: Comments on draft-yourtchenko-colitti-nd-redu… Erik Nordmark
Re: Comments on draft-yourtchenko-colitti-nd-redu… Andrew Yourtchenko
Re: Comments on draft-yourtchenko-colitti-nd-redu… Andrew Yourtchenko