Re: draft-ietf-6man-resilient-rs update

Dmitry Anipko <Dmitry.Anipko@microsoft.com> Tue, 09 September 2014 20:54 UTC

Return-Path: <Dmitry.Anipko@microsoft.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 673691A88BB for <ipv6@ietfa.amsl.com>; Tue, 9 Sep 2014 13:54:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id orRuewwynYPd for <ipv6@ietfa.amsl.com>; Tue, 9 Sep 2014 13:54:41 -0700 (PDT)
Received: from na01-bn1-obe.outbound.protection.outlook.com (mail-bn1on0788.outbound.protection.outlook.com [IPv6:2a01:111:f400:fc10::788]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 81C461A0104 for <ipv6@ietf.org>; Tue, 9 Sep 2014 13:54:40 -0700 (PDT)
Received: from SN2PR03MB077.namprd03.prod.outlook.com (10.255.175.153) by SN2PR03MB079.namprd03.prod.outlook.com (10.255.175.155) with Microsoft SMTP Server (TLS) id 15.0.1029.10; Tue, 9 Sep 2014 20:54:11 +0000
Received: from SN2PR03MB077.namprd03.prod.outlook.com ([169.254.2.212]) by SN2PR03MB077.namprd03.prod.outlook.com ([169.254.2.237]) with mapi id 15.00.1029.000; Tue, 9 Sep 2014 20:54:11 +0000
From: Dmitry Anipko <Dmitry.Anipko@microsoft.com>
To: Erik Nordmark <nordmark@acm.org>, Ole Troan <otroan@employees.org>
Subject: Re: draft-ietf-6man-resilient-rs update
Thread-Topic: draft-ietf-6man-resilient-rs update
Thread-Index: AQHPi7cqsEyuayzG7U2u4m7hjIGf2Zt4m40ygAFYPwCAAJnFgIAyPXiAgABHBZCAAA48gIABC23QgACT74CAAW614IAAKiQAgAFZbSCAH2kSAIAAB51AgAqd3oCAHgawYw==
Date: Tue, 09 Sep 2014 20:54:10 +0000
Message-ID: <1410296050806.1900@microsoft.com>
References: <103E7D11-3748-4EA9-B3A4-C5027766001F@employees.org> <1403194605449.87135@microsoft.com> <5ED66186-B218-4A2B-A47B-B467BA92A742@employees.org> <53A4AE11.5040900@ericsson.com> <BA7546DE-AD9B-46BA-90DD-6E89836E2A2F@employees.org> <3dfb3ea73b5d450c923198ce06704f5a@SN2PR03MB077.namprd03.prod.outlook.com> <57CD62D4-0D87-4F23-A6AA-0AA9E4EFCD26@employees.org> <7613b4b345e2473fb48ef90e7dc5844f@SN2PR03MB077.namprd03.prod.outlook.com> <F795F34D-AC5F-4DB5-9B99-B7D2EA462839@employees.org> <8c9ccd4087ca4358862e773776bf67b5@SN2PR03MB077.namprd03.prod.outlook.com> <CB21E017-EB8A-449A-9EE9-0346C214DE24@employees.org> <d63acea236f64a22bd51b30cefad37b8@SN2PR03MB077.namprd03.prod.outlook.com> <53ED4910.7040104@acm.org> <9c69785532244f2aa3db4d752f384b9c@BLUPR03MB066.namprd03.prod.outlook.com>, <53F6375D.1070907@acm.org>
In-Reply-To: <53F6375D.1070907@acm.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [2001:4898:80e8:ee31::2]
x-microsoft-antispam: BCL:0;PCL:0;RULEID:;UriScan:;
x-forefront-prvs: 0329B15C8A
x-forefront-antispam-report: SFV:NSPM; SFS:(10019018)(6009001)(479174003)(24454002)(377454003)(199003)(13464003)(55674003)(189002)(51704005)(50986999)(31966008)(21056001)(106116001)(80022001)(79102001)(76176999)(230783001)(99286002)(107046002)(64706001)(54356999)(106356001)(83072002)(99396002)(92726001)(86362001)(117636001)(93886004)(85306004)(90102001)(77982001)(92566001)(105586002)(74502001)(46102001)(19580395003)(15202345003)(81342001)(101416001)(83322001)(81542001)(561944003)(87936001)(95666004)(85852003)(4396001)(20776003)(77096002)(74662001)(76482001)(36756003)(19580405001)(15975445006)(2656002)(97736003)(22906005)(3826002); DIR:OUT; SFP:1102; SCL:1; SRVR:SN2PR03MB079; H:SN2PR03MB077.namprd03.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; A:1; MX:1; LANG:en;
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: microsoft.onmicrosoft.com
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/QV7Uy0q6Plbv-P1sWGzODNKWto8
Cc: "<6man-chairs@tools.ietf.org>" <6man-chairs@tools.ietf.org>, "draft-ietf-6man-resilient-rs@tools.ietf.org" <draft-ietf-6man-resilient-rs@tools.ietf.org>, 6man WG <ipv6@ietf.org>, Suresh Krishnan <suresh.krishnan@ericsson.com>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Sep 2014 20:54:44 -0000

Hello Erik,

sorry for the delay.

Would your (RS or RA) "storm" concern be mitigated if the N was made larger (order of a minute(s)), and within that time frame the first few RSes which the host would have to send were required to be unicast? If those were not responded, then the host would re-start the algorithm from section 2.

>>But I still don't understand (or I don't agree) with the problem you are
trying to solve.

The problem I'm trying to solve is to increase chances that the host doesn't have a routing configuration, where the host could otherwise have it. In the specific example I've seen, it was caused by a router misconfiguration, but I can't speak for whether there are or aren't other reasons for that in practice.

Thank you.
________________________________________
From: Erik Nordmark <nordmark@acm.org>
Sent: Thursday, August 21, 2014 11:15 AM
To: Dmitry Anipko; Erik Nordmark; Ole Troan
Cc: <6man-chairs@tools.ietf.org>; draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG; Suresh Krishnan
Subject: Re: draft-ietf-6man-resilient-rs update

On 8/14/14, 5:13 PM, Dmitry Anipko wrote:
> Hi Erik,
>
> After some offline discussion, here is the adjustment of the text we'd like to propose. Please let me know if this sufficiently mitigates your concern.
>
> Replace the first paragraph of the section 2.1 of http://tools.ietf.org/id/draft-ietf-6man-resilient-rs-03.txt  with the following:
>
> <text>
> On multicast-capable links, the hosts following this specification
> SHOULD stop retransmitting the RSs when Router Discovery is successful
> (i.e. an RA with a non-zero Router Lifetime that results in a default
> route is received).
>
> When and if all of the following conditions are met:
> 1. The host received an RA with non-zero Router Lifetime since the
> last attachment of the interface to the link, and has not received
> later an RA with zero Router Lifetime from the same router.
> 2. Less than N seconds (where N is randomly chosen by the host at the
> interface initialization time to be IRT < N <= 2*IRT) remain till the
> expiration of all of the known default routers on the interface.
What is IRT?
> the host MAY re-start sending of the RSs following the Proposed
> algorithm, described in the section 2.
> </text>

This ensures that as long as there is no outage there would be no extra
RS messages.

However, if there is some outage all the hosts are likely to flood the
network with RS messages at the same time.
(For instance, if L2 is fine but some L3 breakage results in the router
not getting the RAs out.)

The "at the same time" part would be quite bad in a large network e.g.
10000 hosts on the link - whether WiFi or cellular.
One could try to address that by adding a random delay, but that is a
bit hard because the host doesn't know whether there are 10 or 10,000
hosts on the link.

Furthermore, the algorithm in section 2 is to send multicast RS. That is
needed if the last router silently died and a replacement router (with a
different MAC and IPv6 address) has been put in place instead. 10,000
muticast RS as the same time seems bad.

But if the issue is instead that the link isn't very good at delivering
the multicast RAs and the router is fine, then a unicast RS would suffice.

But I still don't understand (or I don't agree) with the problem you are
trying to solve.

Is the problem you want to solve a misconfigured router (which sends
periodic RAs less frequently than the default router lifetime)?
Is it links where multicast (RAs) are very unreliable, hence the host
can miss the 3 that should appear before the default router lifetime
expires?
Routers which get replaced (without VRRP or HSRP) *and* do not send an
initial set of multicast RAs?

For links with unreliable (or no) multicast RAs there are much more
robust ways to do this.

    Erik

>
> -Dmitry
>
>
> -----Original Message-----
> From: Erik Nordmark [mailto:nordmark@acm.org]
> Sent: Thursday, August 14, 2014 4:41 PM
> To: Dmitry Anipko; Ole Troan
> Cc: <6man-chairs@tools.ietf.org>; draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG; Suresh Krishnan
> Subject: Re: draft-ietf-6man-resilient-rs update
>
>
> [Delay in responding due to travel and vacation]
>
> On 7/25/14, 5:21 PM, Dmitry Anipko wrote:
>> Let's separate discussions on the substance and on what exact language we put in the draft.
>>
>> Substance wise - IMO, the change tries to reduce existing fragility of RD in some situations. We had very real real-world support cases with one of the top mobile operators in the US, where the network was not sending new RAs before expiring the info in the initial one. Yes, this was  a misconfiguration / bug on the network side. Yes, this is where it was eventually fixed, and yes perhaps the support teams should not have taken as much time to figure it out. But the end-users don't care about that part, and they would be better off in the meantime if the protocol had more resiliency against misconfiguration or packet loss. Speaking for myself only, this is the scenario I'm concerned about. I'm sure there are others who know the "no multicast RAs, period" scenario better than I do, and they can speak for that.
> Dmitry,
> I don't understand how we can make the protocol specification more robust against the above type of misconfiguration - at least not without resulting in a lot more chatty protocol.
>
> For instance, suppose we say that hosts should send unicast a RS every
> 10 seconds. That would be quite chatty and would still result in the same failures if the operator accidentally configures the router (and/or
> prefix) lifetime to be 5 seconds.
> And having the hosts look at the remaining lifetimes and decide to unicast (or multicast) RSes closer and closer together as the
> lifetime(s) get closer to expiry might result in an explosion of RS messages during a network outage.
>
>      Erik
>
>> So I think it would be good to first understand, leaving specific RFC language aside, whether there is consensus to improve RD reliability when the host sees that previously received information has expired / about to expire. My impression from the discussion in London was that there was such consensus - but perhaps I misunderstood. Letting the host do additional rate-limited transmissions to solve that, to me looks like a simple and reasonable approach. If you agree with that, then what's the specific concern on putting that as MAY?
>>
>> I do disagree with the viewpoint, that because if ND use of multicast is inefficient, let's not make any improvements to ND until multicast issue is solved. Yes, the multicast-related issues in general do need to be addressed over long term. But addressing them will take time, and there are issues today which end-users are hitting, for which incremental limited fixes within existing framework are possible and reasonable.
>>
>>>> today, links without periodic RAs are not supported.
>> Since this is a statement about today, it equally applies (or not applies), to the option 2 you suggested and the option I suggested, so it's not a differentiator. Furthermore, it basically is equivalent to option 1, which makes it not an independent argument, but re-iterating option 1, and can't be used as a differentiator there either.
>>
>> -----Original Message-----
>> From: Ole Troan [mailto:otroan@employees.org]
>> Sent: Thursday, July 24, 2014 8:25 PM
>> To: Dmitry Anipko
>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>> Subject: Re: draft-ietf-6man-resilient-rs update
>>
>>>>> wouldn't those hosts initially do router discovery, then not ever send RS again, and would be left without a default router when the router lifetime expired?
>>> Assuming there are no other events, such as media state change - correct, and that's the behavior they get today. As I understand it, this MAY is about a specific behavior improvement, compared to the today state, which a vendor may choose to implement.
>> today, links without periodic RAs are not supported.
>>
>> cheers,
>> Ole
>>
>>> -----Original Message-----
>>> From: Ole Troan [mailto:otroan@employees.org]
>>> Sent: Wednesday, July 23, 2014 8:01 PM
>>> To: Dmitry Anipko
>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>
>>> Dmitry,
>>>
>>>>>> what would happen with the hosts that didn't implement the feature on links w/o periodic multicast RAs?
>>>> It would not get this particular improvement in behavior compared to the pre-draft state. But since the answer seems to be trivial, I'm not sure I understood the question correctly?
>>> wouldn't those hosts initially do router discovery, then not ever send RS again, and would be left without a default router when the router lifetime expired?
>>>
>>> cheers,
>>> Ole
>>>
>>>
>>>> -----Original Message-----
>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>> Sent: Tuesday, July 22, 2014 7:15 PM
>>>> To: Dmitry Anipko
>>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>
>>>> Dmitry,
>>>>
>>>>> Given the definition of MAY in RFC 2119, why is the wording, which is currently in the text, not an option?
>>>>>
>>>>> <quote>
>>>>> In the
>>>>> same vein an implementation which does include a particular option
>>>>> MUST be prepared to interoperate with another implementation which
>>>>> does not include the option (except, of course, for the feature the
>>>>> option provides.) </quote>
>>>>>
>>>>> Where "the feature" would be e.g. "supporting links w/o periodic multicast RAs"?
>>>> what would happen with the hosts that didn't implement the feature on links w/o periodic multicast RAs?
>>>>
>>>> cheers,
>>>> Ole
>>>>
>>>>> -----Original Message-----
>>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>>> Sent: Tuesday, July 22, 2014 2:10 PM
>>>>> To: Suresh Krishnan
>>>>> Cc: Dmitry Anipko; <6man-chairs@tools.ietf.org>;
>>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>>
>>>>> Suresh, et al,
>>>>>
>>>>> to pick up this after my holiday.
>>>>>
>>>>>>>> My impression from the discussion in London that the consensus was closer to (2) than (1) - that on such links the hosts should be allowed to restart transmissions in a reasonable / rate-limited manner, when connectivity parameters expire. And that we should add MAY text for that.
>>>>>>> that's also fine with me. if there was a text proposal it would most likely be easier for the working group to come to a decision.
>>>>>>> is that something the authors can take on?
>>>>>> In my view, the text in the draft already reflects something akin to what Dmitry was talking about (but not exactly option 2 from your original mail on this thread). This is the relevant text from the draft.
>>>>>> Hosts MAY continue retransmitting the RSs even after router discovery is successful.  If the host continues to retransmit RSs, it is RECOMMENDED that such retransmissions be rate-limited to one every MRT.
>>>>> the current text states: hosts SHOULD stop, but MAY continue.
>>>>> on a link where there are no periodic multicast RA, then all the hosts that doesn't implement the MAY will fail after the initial Router lifetime seconds.
>>>>>
>>>>> to me it appears we have two options.
>>>>> 1) only solve the initial connection case.
>>>>> remove any mention of links without periodic RAs (remove the last sentence of the abstract, and bullet 1b.).
>>>>>    possibly remove the MAY in the "SHOULD stop, but MAY continue"
>>>>>    advance the document to the IESG
>>>>> 2) take the document back to the WG and solve the problems that Erik outlined in his mail.
>>>>>
>>>>> cheers,
>>>>> Ole
>> --------------------------------------------------------------------
>> IETF IPv6 working group mailing list
>> ipv6@ietf.org
>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>> --------------------------------------------------------------------
>>
>