Re: draft-ietf-6man-resilient-rs update

Erik Nordmark <nordmark@acm.org> Thu, 21 August 2014 18:16 UTC

Return-Path: <nordmark@acm.org>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 471A41A7032 for <ipv6@ietfa.amsl.com>; Thu, 21 Aug 2014 11:16:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.235
X-Spam-Level:
X-Spam-Status: No, score=-1.235 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_SOFTFAIL=0.665] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZctieciztoDM for <ipv6@ietfa.amsl.com>; Thu, 21 Aug 2014 11:16:09 -0700 (PDT)
Received: from c.mail.sonic.net (c.mail.sonic.net [64.142.111.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F0081A6FD3 for <ipv6@ietf.org>; Thu, 21 Aug 2014 11:16:07 -0700 (PDT)
Received: from [172.22.249.34] ([162.210.130.4]) (authenticated bits=0) by c.mail.sonic.net (8.14.9/8.14.9) with ESMTP id s7LIFvm4031005 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Thu, 21 Aug 2014 11:15:58 -0700
Message-ID: <53F6375D.1070907@acm.org>
Date: Thu, 21 Aug 2014 11:15:57 -0700
From: Erik Nordmark <nordmark@acm.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.6.0
MIME-Version: 1.0
To: Dmitry Anipko <Dmitry.Anipko@microsoft.com>, Erik Nordmark <nordmark@acm.org>, Ole Troan <otroan@employees.org>
Subject: Re: draft-ietf-6man-resilient-rs update
References: <103E7D11-3748-4EA9-B3A4-C5027766001F@employees.org> <1403194605449.87135@microsoft.com> <5ED66186-B218-4A2B-A47B-B467BA92A742@employees.org> <53A4AE11.5040900@ericsson.com> <BA7546DE-AD9B-46BA-90DD-6E89836E2A2F@employees.org> <3dfb3ea73b5d450c923198ce06704f5a@SN2PR03MB077.namprd03.prod.outlook.com> <57CD62D4-0D87-4F23-A6AA-0AA9E4EFCD26@employees.org> <7613b4b345e2473fb48ef90e7dc5844f@SN2PR03MB077.namprd03.prod.outlook.com> <F795F34D-AC5F-4DB5-9B99-B7D2EA462839@employees.org> <8c9ccd4087ca4358862e773776bf67b5@SN2PR03MB077.namprd03.prod.outlook.com> <CB21E017-EB8A-449A-9EE9-0346C214DE24@employees.org> <d63acea236f64a22bd51b30cefad37b8@SN2PR03MB077.namprd03.prod.outlook.com> <53ED4910.7040104@acm.org> <9c69785532244f2aa3db4d752f384b9c@BLUPR03MB066.namprd03.prod.outlook.com>
In-Reply-To: <9c69785532244f2aa3db4d752f384b9c@BLUPR03MB066.namprd03.prod.outlook.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Sonic-CAuth: UmFuZG9tSVbR6W63rF5L7Yg2AbXOyZlTYFyrKfII+NrUXtJPn3992Y7CdCyLijlaWyRudLahpuBi7AtIDPD9j9A4ww6UW7C/
X-Sonic-ID: C;8njaMl8p5BGkofofoK8kYw== M;fuPyMl8p5BGkofofoK8kYw==
X-Sonic-Spam-Details: 0.0/5.0 by cerberusd
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/g9nau2nFaixVuzcfkNGz3sH6KAA
Cc: "<6man-chairs@tools.ietf.org>" <6man-chairs@tools.ietf.org>, "draft-ietf-6man-resilient-rs@tools.ietf.org" <draft-ietf-6man-resilient-rs@tools.ietf.org>, 6man WG <ipv6@ietf.org>, Suresh Krishnan <suresh.krishnan@ericsson.com>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Aug 2014 18:16:11 -0000

On 8/14/14, 5:13 PM, Dmitry Anipko wrote:
> Hi Erik,
>
> After some offline discussion, here is the adjustment of the text we'd like to propose. Please let me know if this sufficiently mitigates your concern.
>
> Replace the first paragraph of the section 2.1 of http://tools.ietf.org/id/draft-ietf-6man-resilient-rs-03.txt  with the following:
>
> <text>
> On multicast-capable links, the hosts following this specification
> SHOULD stop retransmitting the RSs when Router Discovery is successful
> (i.e. an RA with a non-zero Router Lifetime that results in a default
> route is received).
>
> When and if all of the following conditions are met:
> 1. The host received an RA with non-zero Router Lifetime since the
> last attachment of the interface to the link, and has not received
> later an RA with zero Router Lifetime from the same router.
> 2. Less than N seconds (where N is randomly chosen by the host at the
> interface initialization time to be IRT < N <= 2*IRT) remain till the
> expiration of all of the known default routers on the interface.
What is IRT?
> the host MAY re-start sending of the RSs following the Proposed
> algorithm, described in the section 2.
> </text>

This ensures that as long as there is no outage there would be no extra 
RS messages.

However, if there is some outage all the hosts are likely to flood the 
network with RS messages at the same time.
(For instance, if L2 is fine but some L3 breakage results in the router 
not getting the RAs out.)

The "at the same time" part would be quite bad in a large network e.g. 
10000 hosts on the link - whether WiFi or cellular.
One could try to address that by adding a random delay, but that is a 
bit hard because the host doesn't know whether there are 10 or 10,000 
hosts on the link.

Furthermore, the algorithm in section 2 is to send multicast RS. That is 
needed if the last router silently died and a replacement router (with a 
different MAC and IPv6 address) has been put in place instead. 10,000 
muticast RS as the same time seems bad.

But if the issue is instead that the link isn't very good at delivering 
the multicast RAs and the router is fine, then a unicast RS would suffice.

But I still don't understand (or I don't agree) with the problem you are 
trying to solve.

Is the problem you want to solve a misconfigured router (which sends 
periodic RAs less frequently than the default router lifetime)?
Is it links where multicast (RAs) are very unreliable, hence the host 
can miss the 3 that should appear before the default router lifetime 
expires?
Routers which get replaced (without VRRP or HSRP) *and* do not send an 
initial set of multicast RAs?

For links with unreliable (or no) multicast RAs there are much more 
robust ways to do this.

    Erik

>
> -Dmitry
>
>
> -----Original Message-----
> From: Erik Nordmark [mailto:nordmark@acm.org]
> Sent: Thursday, August 14, 2014 4:41 PM
> To: Dmitry Anipko; Ole Troan
> Cc: <6man-chairs@tools.ietf.org>; draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG; Suresh Krishnan
> Subject: Re: draft-ietf-6man-resilient-rs update
>
>
> [Delay in responding due to travel and vacation]
>
> On 7/25/14, 5:21 PM, Dmitry Anipko wrote:
>> Let's separate discussions on the substance and on what exact language we put in the draft.
>>
>> Substance wise - IMO, the change tries to reduce existing fragility of RD in some situations. We had very real real-world support cases with one of the top mobile operators in the US, where the network was not sending new RAs before expiring the info in the initial one. Yes, this was  a misconfiguration / bug on the network side. Yes, this is where it was eventually fixed, and yes perhaps the support teams should not have taken as much time to figure it out. But the end-users don't care about that part, and they would be better off in the meantime if the protocol had more resiliency against misconfiguration or packet loss. Speaking for myself only, this is the scenario I'm concerned about. I'm sure there are others who know the "no multicast RAs, period" scenario better than I do, and they can speak for that.
> Dmitry,
> I don't understand how we can make the protocol specification more robust against the above type of misconfiguration - at least not without resulting in a lot more chatty protocol.
>
> For instance, suppose we say that hosts should send unicast a RS every
> 10 seconds. That would be quite chatty and would still result in the same failures if the operator accidentally configures the router (and/or
> prefix) lifetime to be 5 seconds.
> And having the hosts look at the remaining lifetimes and decide to unicast (or multicast) RSes closer and closer together as the
> lifetime(s) get closer to expiry might result in an explosion of RS messages during a network outage.
>
>      Erik
>
>> So I think it would be good to first understand, leaving specific RFC language aside, whether there is consensus to improve RD reliability when the host sees that previously received information has expired / about to expire. My impression from the discussion in London was that there was such consensus - but perhaps I misunderstood. Letting the host do additional rate-limited transmissions to solve that, to me looks like a simple and reasonable approach. If you agree with that, then what's the specific concern on putting that as MAY?
>>
>> I do disagree with the viewpoint, that because if ND use of multicast is inefficient, let's not make any improvements to ND until multicast issue is solved. Yes, the multicast-related issues in general do need to be addressed over long term. But addressing them will take time, and there are issues today which end-users are hitting, for which incremental limited fixes within existing framework are possible and reasonable.
>>
>>>> today, links without periodic RAs are not supported.
>> Since this is a statement about today, it equally applies (or not applies), to the option 2 you suggested and the option I suggested, so it's not a differentiator. Furthermore, it basically is equivalent to option 1, which makes it not an independent argument, but re-iterating option 1, and can't be used as a differentiator there either.
>>
>> -----Original Message-----
>> From: Ole Troan [mailto:otroan@employees.org]
>> Sent: Thursday, July 24, 2014 8:25 PM
>> To: Dmitry Anipko
>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>> Subject: Re: draft-ietf-6man-resilient-rs update
>>
>>>>> wouldn't those hosts initially do router discovery, then not ever send RS again, and would be left without a default router when the router lifetime expired?
>>> Assuming there are no other events, such as media state change - correct, and that's the behavior they get today. As I understand it, this MAY is about a specific behavior improvement, compared to the today state, which a vendor may choose to implement.
>> today, links without periodic RAs are not supported.
>>
>> cheers,
>> Ole
>>
>>> -----Original Message-----
>>> From: Ole Troan [mailto:otroan@employees.org]
>>> Sent: Wednesday, July 23, 2014 8:01 PM
>>> To: Dmitry Anipko
>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>
>>> Dmitry,
>>>
>>>>>> what would happen with the hosts that didn't implement the feature on links w/o periodic multicast RAs?
>>>> It would not get this particular improvement in behavior compared to the pre-draft state. But since the answer seems to be trivial, I'm not sure I understood the question correctly?
>>> wouldn't those hosts initially do router discovery, then not ever send RS again, and would be left without a default router when the router lifetime expired?
>>>
>>> cheers,
>>> Ole
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>> Sent: Tuesday, July 22, 2014 7:15 PM
>>>> To: Dmitry Anipko
>>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>
>>>> Dmitry,
>>>>
>>>>> Given the definition of MAY in RFC 2119, why is the wording, which is currently in the text, not an option?
>>>>>
>>>>> <quote>
>>>>> In the
>>>>> same vein an implementation which does include a particular option
>>>>> MUST be prepared to interoperate with another implementation which
>>>>> does not include the option (except, of course, for the feature the
>>>>> option provides.) </quote>
>>>>>
>>>>> Where "the feature" would be e.g. "supporting links w/o periodic multicast RAs"?
>>>> what would happen with the hosts that didn't implement the feature on links w/o periodic multicast RAs?
>>>>
>>>> cheers,
>>>> Ole
>>>>
>>>>> -----Original Message-----
>>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>>> Sent: Tuesday, July 22, 2014 2:10 PM
>>>>> To: Suresh Krishnan
>>>>> Cc: Dmitry Anipko; <6man-chairs@tools.ietf.org>;
>>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>>
>>>>> Suresh, et al,
>>>>>
>>>>> to pick up this after my holiday.
>>>>>
>>>>>>>> My impression from the discussion in London that the consensus was closer to (2) than (1) - that on such links the hosts should be allowed to restart transmissions in a reasonable / rate-limited manner, when connectivity parameters expire. And that we should add MAY text for that.
>>>>>>> that's also fine with me. if there was a text proposal it would most likely be easier for the working group to come to a decision.
>>>>>>> is that something the authors can take on?
>>>>>> In my view, the text in the draft already reflects something akin to what Dmitry was talking about (but not exactly option 2 from your original mail on this thread). This is the relevant text from the draft.
>>>>>> Hosts MAY continue retransmitting the RSs even after router discovery is successful.  If the host continues to retransmit RSs, it is RECOMMENDED that such retransmissions be rate-limited to one every MRT.
>>>>> the current text states: hosts SHOULD stop, but MAY continue.
>>>>> on a link where there are no periodic multicast RA, then all the hosts that doesn't implement the MAY will fail after the initial Router lifetime seconds.
>>>>>
>>>>> to me it appears we have two options.
>>>>> 1) only solve the initial connection case.
>>>>> remove any mention of links without periodic RAs (remove the last sentence of the abstract, and bullet 1b.).
>>>>>    possibly remove the MAY in the "SHOULD stop, but MAY continue"
>>>>>    advance the document to the IESG
>>>>> 2) take the document back to the WG and solve the problems that Erik outlined in his mail.
>>>>>
>>>>> cheers,
>>>>> Ole
>> --------------------------------------------------------------------
>> IETF IPv6 working group mailing list
>> ipv6@ietf.org
>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>> --------------------------------------------------------------------
>>
>