Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast

Erik Nordmark <nordmark@acm.org> Wed, 26 February 2014 16:17 UTC

Return-Path: <nordmark@acm.org>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 651781A0231 for <ipv6@ietfa.amsl.com>; Wed, 26 Feb 2014 08:17:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.465
X-Spam-Level: *
X-Spam-Status: No, score=1.465 tagged_above=-999 required=5 tests=[BAYES_50=0.8, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.665] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TpXftYhC8ofD for <ipv6@ietfa.amsl.com>; Wed, 26 Feb 2014 08:17:16 -0800 (PST)
Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80]) by ietfa.amsl.com (Postfix) with ESMTP id BFD601A00B6 for <ipv6@ietf.org>; Wed, 26 Feb 2014 08:17:16 -0800 (PST)
Received: from [192.168.10.18] ([78.204.24.4]) (authenticated bits=0) by c.mail.sonic.net (8.14.4/8.14.4) with ESMTP id s1QFWsr1020091 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Wed, 26 Feb 2014 07:32:56 -0800
Message-ID: <530E0925.5030801@acm.org>
Date: Wed, 26 Feb 2014 07:32:53 -0800
From: Erik Nordmark <nordmark@acm.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: "Pascal Thubert (pthubert)" <pthubert@cisco.com>
Subject: Re: Comments on draft-yourtchenko-colitti-nd-reduce-multicast
References: <5305AF13.5060201@acm.org> <75B6FA9F576969419E42BECB86CB1B89115F99A9@xmb-rcd-x06.cisco.com> <alpine.OSX.2.00.1402211620560.49053@ayourtch-mac> <75B6FA9F576969419E42BECB86CB1B89115F9BAE@xmb-rcd-x06.cisco.com> <alpine.OSX.2.00.1402212129450.52880@ayourtch-mac>, <530C9CFD.2000409@acm.org> <4A7C4B57-8D55-48AE-AC8E-4D85768CDE7E@cisco.com>
In-Reply-To: <4A7C4B57-8D55-48AE-AC8E-4D85768CDE7E@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 8bit
X-Sonic-ID: C;yJIKQ/ue4xGt95IjCY+HFQ== M;KMC9Q/ue4xGt95IjCY+HFQ==
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/uAzPKKudCSNTzrGCf5qFDlEVoio
Cc: "Andrew Yourtchenko (ayourtch)" <ayourtch@cisco.com>, IETF IPv6 <ipv6@ietf.org>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Feb 2014 16:17:19 -0000

On 2/26/14 12:07 AM, Pascal Thubert (pthubert) wrote:
> I can not find a reference to TOS bits or priority in 4861...
>
> Certainly NUD should be higher priority than traffic?
I don't think the question has come up in the past.
No harm in using a different TClass for Neighbor Discovery and/or 
different L2 priority (e.g., 802.1p).

But it might not make much of a difference if the main point where the 
priority matters is on the receiving host or router when it passes the 
packet to the CPU. For that the local implementation-specific mechanisms 
can presumably determine that it is an ND packet and queue/process it 
appropriately.

RFC 6583 talks about even finer grain prioritization for ND traffic - 
without any use of TClass or 802.1p.

Regards,
    Erik

>
> Pascal
>
>> Le 25 févr. 2014 à 14:39, "Erik Nordmark" <nordmark@acm.org> a écrit :
>>
>>> On 2/21/14 12:36 PM, Andrew Yourtchenko wrote:
>>>
>>> Myself I did not like this portion of my reply with no quantifiable data (because operating on a basis of a belief is not a good engineering).
>>>
>>> I happened to have a couple of hours of offline time, during which I tried to sketch the scenario and try to get as far as I can without building a lab.
>>>
>>> It's a *very rough sketch*. If you think there can be parts that can be made better in it, tell me.
>> Andrew,
>> I like this sketch. A few comments below.
>>> The initial assumptions for this thought experiment are as follows:
>>>
>>> 1) We have 10000 clients in a single /64.
>>>
>>> 2) There are multiple APs that bridge the traffic from wired onto
>>> wireless medium, with the client count limited to 100 per AP.
>>>
>>> 3) there is 20x speed difference between unicast transmission and
>>> multicast transmission.: the effective multicast speed is assumed to be
>>> 1 mbps, the effective unicast speed is assumed to be 20mbps data rate.
>>>
>>> 4) The APs are assumed to be "Naive" i.e. they do not perform any snooping
>>> nor multicast->unicast conversions, but at the same time they are able to bridge the unicast traffic without flooding it between multiple access points. I.e. we assume a model where we have a single router (or an FHRP pair) and a set of 100 APs bridging the traffic.
>>>
>>> (Corollary from the above: The effective unicast capacity is 100*20mbps, whereas the effective multicast capacity is 1*1Mbps, therefore the difference in throughput is 1000-fold).
>>>
>>> Let's first consider the steady state. Suppose each host downloads
>>> a file at 0.1 mbps.  Within each AP, therefore we have a 50% capacity utilization (0.1*100 = 10Mbps, we have 20 Mbps capacity).
>>>
>>> It's easy to see this comfortably accomodates all the hosts. Obviuously
>>> the unicast NUD in this traffic is fairly minimal, so I don't think it's even worth to count how much it is.
>>>
>>> Now, lets look at a potential failure.
>>>
>>> At the time of the NUD probe, it's enough to lose the 3 retries,
>>> spaced 1 second apart.  The default reachable time is 30 seconds,
>>> with the random jitter of 0.5 of that.
>>>
>>> So, all that is needed to achieve a mass NUD failure is a ~30 second
>>> outage during the period when all the hosts are sending the traffic.
>>>
>>> A reboot of the majority of the networking gear takes several times longer than this.
>>>
>>> Therefore, a crash of the default gateway during the peak hour is
>>> a one guaranteed trigger for this to happen.
>>>
>>> So, now we have a situation of 10000 hosts which have deleted their default gateway from the neighbor table, and send the multicast neighbor solicitations for it.
>>>
>>> Assume the NS is 64 bytes, 10000 hosts sending such a packet means 64 bytes * 8 bit * 10000 hosts = 5120000 bits/sec - or, 5 mbps.
>> Just to make it clear, each NS retransmissions is 5 Mbits and the retransmission time is 1 second. Hence with 4861 we get 5 mbps for 3 seconds. However, they hosts aren't that likely to all discover the NUD failure in the same 3 second window. With ReachableTime=30 seconds it depends on when there last successful NUD probe or higher-level reachability advise can in. Hmm - with TCP the reachability advise might come with every left window edge advancement i.e. they would synchronize quite closely.
>>
>>> Note, that this is only the shared bandwidth downstream back to the hosts - the airtime spent by the upstream traffic is 20x faster, so I am gratuitously discarding it.
>>>
>>> Since the APs can not send the traffic at this rate, obviously, we will
>>> need to drop some of it. Note, if the clients succeed with the ND to the default gateway, they will start streaming data again, so the effective multicast throughput will drop to the 0.5 mbps as soon as the noticeable portion of clients recovers.
>> While the multicast NS will see downstream drops, what matters for the resolution is that the router receives it and responds. Thus why wouldn't that part work?
>> The NA from the router will be unicast so if that unicast isn't dropped due to the multicasts then it should resolve quickly. I don't know if unicast is prioritized higher than multicast or not.
>>
>> (We'll see downstream overload due to the multicasts, but that might just last for 1 second if the NS/NA to/from the router gets through.)
>>
>>> Let's assume the best case of 20% of the clients managing to recover within the first second.
>>>
>>> As we are approaching full recovery, the lesser number of clients will be able to get their multicast NS sent because the airtime is being taken by the payload traffic. Anyway, let's discard that and assume that every second 20% of the clients will recover.
>>>
>>> This means that the recovery of the full set of the clients in this conditions will take an *absolute* minimum of 5 seconds.
>>>
>>> Did I prove myself wrong ? Seems so. We can see that with some of the relaxed assumptions I took, the hosts seem to recover.
>>>
>>> But, let's add to this a little bit of mDNS, and other multicast-loving protocols, which tend to generate a fair chunk of traffic when they detect the network was "restored".
>>>
>>> This can shrink the available capacity several times. Add to this that the host does not necessarily stream at 100kbps, but might have higher data rates, and I think we can consider the available
>>> multicast capacity at startup to be 1/10th of what it is in theory.
>> Yes, in general TCP will fill the pipe thus at each AP all of the downstream bandwidth will be used.
>>
>> But if the streaming comes via the default router (which crashed or was unreachable for a few seconds), then the stream would also slow down due to TCP, in which case there is less of a multicast overload issue. There might be cases where this TCP slowdown doesn't happen though.
>>
>>   Erik
>>
>>> This is where the things may become interesting - 1/10th of the capacity means that it will take not 5, but 50 seconds for all the hosts to recover.
>>>
>>> This means that the hosts which recovered in the very first second, will already be sending NUD traffic while the network is still under stress. If these packets are lost, the hosts might back into the pool of the "orphans" who are sending the multicast, because they delete their neighbor entry.
>>>
>>> There are some other things to consider:
>>>
>>> - I deliberately kept the scenario here with *one* ND entry.
>>> Assuming your hosts are talking with 3-4 other hosts besides the default gateway. This increases the load and proportionally makes the dangerous state easier to achieve.
>>>
>>> - another factor that I am omitting - that such a storm of ND onto the default gateway might cause rate-limiting of the control plane packets on the gateway. With some of the limits being as low as 1000pps, this might give a recovery time on the order of minutes even without the wireless multicast being the bottleneck, yet still resulting in a lot of multicast NS in the air still during the slow recovery.
>>>
>>> This is about as precise of the construct I can build.
>>>
>>> Is it perfect ? No, by no means. It also does assume that in the case of the default gateway the wireless performance will be limiting factor - I think it won't - so it's more of an appropriate scenario for a case of a p2p communications on the network. The default-gateway-only will be inherently much more stable, I think - because the multicast on the wired side is fast, so the drops on the wireless side will not matter.
>>>
>>> It's probably worth it to change the text into "There is potential for the failing NUD to *contribute* to a longer recovery and possible creation of the locked situation in the case of flash failure - but the exact quantification of the impact in such an environment is a topic of further study".
>>>
>>> And then maybe I could dump the above thought experiment into a separate draft, to see if the folks could contribute to the experiment / maybe someone could run it - and reference it in this item ?
>>>
>>> It seems like an interesting area to dig a bit more in - creating a suitable model and playing with the parameters to see where it breaks seems like a useful exercise to understand how many hosts can there be in a single /64 on WiFi with a "naive" set of access-points.
>>>
>>> Thoughts ?
>>>
>>> --a
>> --------------------------------------------------------------------
>> IETF IPv6 working group mailing list
>> ipv6@ietf.org
>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>> --------------------------------------------------------------------