Re: draft-ietf-6man-resilient-rs update

Dmitry Anipko <Dmitry.Anipko@microsoft.com> Tue, 16 September 2014 20:25 UTC

Return-Path: <Dmitry.Anipko@microsoft.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B7BE11A009C for <ipv6@ietfa.amsl.com>; Tue, 16 Sep 2014 13:25:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AyWcxfOod7rZ for <ipv6@ietfa.amsl.com>; Tue, 16 Sep 2014 13:24:59 -0700 (PDT)
Received: from na01-by2-obe.outbound.protection.outlook.com (mail-by2on0111.outbound.protection.outlook.com [207.46.100.111]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0D3F41A039F for <ipv6@ietf.org>; Tue, 16 Sep 2014 13:24:58 -0700 (PDT)
Received: from SN2PR03MB077.namprd03.prod.outlook.com (10.255.175.153) by SN2PR03MB078.namprd03.prod.outlook.com (10.255.175.154) with Microsoft SMTP Server (TLS) id 15.0.1029.10; Tue, 16 Sep 2014 20:24:57 +0000
Received: from SN2PR03MB077.namprd03.prod.outlook.com ([169.254.2.29]) by SN2PR03MB077.namprd03.prod.outlook.com ([169.254.2.29]) with mapi id 15.00.1029.000; Tue, 16 Sep 2014 20:24:56 +0000
From: Dmitry Anipko <Dmitry.Anipko@microsoft.com>
To: Erik Nordmark <nordmark@acm.org>, Ole Troan <otroan@employees.org>
Subject: Re: draft-ietf-6man-resilient-rs update
Thread-Topic: draft-ietf-6man-resilient-rs update
Thread-Index: AQHPi7cqsEyuayzG7U2u4m7hjIGf2Zt4m40ygAFYPwCAAJnFgIAyPXiAgABHBZCAAA48gIABC23QgACT74CAAW614IAAKiQAgAFZbSCAH2kSAIAAB51AgAqd3oCAHgawY4AC+MGAgAf/7s0=
Date: Tue, 16 Sep 2014 20:24:56 +0000
Message-ID: <1410899096295.54835@microsoft.com>
References: <103E7D11-3748-4EA9-B3A4-C5027766001F@employees.org> <1403194605449.87135@microsoft.com> <5ED66186-B218-4A2B-A47B-B467BA92A742@employees.org> <53A4AE11.5040900@ericsson.com> <BA7546DE-AD9B-46BA-90DD-6E89836E2A2F@employees.org> <3dfb3ea73b5d450c923198ce06704f5a@SN2PR03MB077.namprd03.prod.outlook.com> <57CD62D4-0D87-4F23-A6AA-0AA9E4EFCD26@employees.org> <7613b4b345e2473fb48ef90e7dc5844f@SN2PR03MB077.namprd03.prod.outlook.com> <F795F34D-AC5F-4DB5-9B99-B7D2EA462839@employees.org> <8c9ccd4087ca4358862e773776bf67b5@SN2PR03MB077.namprd03.prod.outlook.com> <CB21E017-EB8A-449A-9EE9-0346C214DE24@employees.org> <d63acea236f64a22bd51b30cefad37b8@SN2PR03MB077.namprd03.prod.outlook.com> <53ED4910.7040104@acm.org> <9c69785532244f2aa3db4d752f384b9c@BLUPR03MB066.namprd03.prod.outlook.com>, <53F6375D.1070907@acm.org> <1410296050806.1900@microsoft.com>,<5411E597.2000301@acm.org>
In-Reply-To: <5411E597.2000301@acm.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [2001:4898:80e8:ee31::2]
x-microsoft-antispam: BCL:0;PCL:0;RULEID:;UriScan:;
x-forefront-prvs: 03361FCC43
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(51704005)(479174003)(24454002)(13464003)(377454003)(189002)(199003)(19580395003)(31966008)(76482001)(99396002)(64706001)(105586002)(36756003)(90102001)(54356999)(95666004)(101416001)(93886004)(83322001)(107046002)(92566001)(106356001)(4396001)(230783001)(19580405001)(117636001)(21056001)(85852003)(106116001)(15202345003)(92726001)(76176999)(87936001)(83072002)(97736003)(50986999)(20776003)(86362001)(99286002)(86612001)(85306004)(81342003)(561944003)(2656002)(15975445006)(74662003)(46102003)(79102003)(77096002)(77982003)(80022003)(74502003)(81542003)(22906005)(3826002); DIR:OUT; SFP:1102; SCL:1; SRVR:SN2PR03MB078; H:SN2PR03MB077.namprd03.prod.outlook.com; FPR:; MLV:sfv; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginatorOrg: microsoft.onmicrosoft.com
Archived-At: http://mailarchive.ietf.org/arch/msg/ipv6/7R1UOQ_N6SaT-udLnHRzYbGCUNs
Cc: "<6man-chairs@tools.ietf.org>" <6man-chairs@tools.ietf.org>, "draft-ietf-6man-resilient-rs@tools.ietf.org" <draft-ietf-6man-resilient-rs@tools.ietf.org>, 6man WG <ipv6@ietf.org>, Suresh Krishnan <suresh.krishnan@ericsson.com>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Sep 2014 20:25:12 -0000

Hi Erik,

Since it seems you don't agree with the problem statement, I'd suggest to ask Chairs poll the WG whether WG believes there is a problem to be solved. I do agree that unless there is a consensus on whether there is or there isn't a problem and what that problem is, discussions of solution details would likely not lead us to a better state.

Thank you.
________________________________________
From: Erik Nordmark <nordmark@acm.org>
Sent: Thursday, September 11, 2014 11:10 AM
To: Dmitry Anipko; Erik Nordmark; Ole Troan
Cc: <6man-chairs@tools.ietf.org>; draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG; Suresh Krishnan
Subject: Re: draft-ietf-6man-resilient-rs update

On 9/9/14, 1:54 PM, Dmitry Anipko wrote:
> Hello Erik,
>
> sorry for the delay.
>
> Would your (RS or RA) "storm" concern be mitigated if the N was made larger (order of a minute(s)), and within that time frame the first few RSes which the host would have to send were required to be unicast? If those were not responded, then the host would re-start the algorithm from section 2.

That would merely change when the storm would happen. On e.g., a WiFi
network with 10,000 hosts that gets flaky (e.g., due to severe packet
congestion or severe radio congestion/interference) then the hosts will
place additional load on the network. In general that is a bad idea, in
particular when the added load has a high risk of being synchronized
across the different hosts as in this case - their default router list
entry would time out at the same time.

If there was a very good reason to make this work, which could work on
it. But I don't think the problem below makes much sense.
>
>>> But I still don't understand (or I don't agree) with the problem you are
> trying to solve.
>
> The problem I'm trying to solve is to increase chances that the host doesn't have a routing configuration, where the host could otherwise have it. In the specific example I've seen, it was caused by a router misconfiguration, but I can't speak for whether there are or aren't other reasons for that in practice.

We definitely want networks to be robust along many axis. But I'm far
from clear that this type of misconfiguration is best handled by having
the hosts change their behavior in particular if that is likely to cause
additional pain when the network already has problems.

The issue you identified was that a router was configured with
{Min,Max}RtrAdvInterval > AdvDefaultLifetime. I would think that a good
router implementation would just prohibit such a configuration, and
probably issue warnings if {Min,Max}RtrAdvInterval > 2 or 3 times
AdvDefaultLifetime; the assumption in RFC 4861 is that several
unsolicited RAs are sent before AdvDefaultLifetime expires.

If implementers need additional advise on this front, we can definitely
have a draft which adds the appropriate SHOULDs or MUSTs in terms of the
constraints on these time values (FWIW RFC 4861 merely defines the
default relationships between the values.)

I haven't heard others comment on the mailing list whether or not they
think we should work on this particular problem of misconfigured
routers, and whether or not folks think we should solve that by changing
the protocol and hosts or providing implementation advice for routers.


Just as a reminder, I do think there are other somewhat related problems
for RS/RA exchanges where we should explore options that can be enabled
on multicast challenged links. But that is different than having the
protocol and hosts change to handle misconfigured routers.

Regards,
     Erik



>
> Thank you.
> ________________________________________
> From: Erik Nordmark <nordmark@acm.org>
> Sent: Thursday, August 21, 2014 11:15 AM
> To: Dmitry Anipko; Erik Nordmark; Ole Troan
> Cc: <6man-chairs@tools.ietf.org>; draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG; Suresh Krishnan
> Subject: Re: draft-ietf-6man-resilient-rs update
>
> On 8/14/14, 5:13 PM, Dmitry Anipko wrote:
>> Hi Erik,
>>
>> After some offline discussion, here is the adjustment of the text we'd like to propose. Please let me know if this sufficiently mitigates your concern.
>>
>> Replace the first paragraph of the section 2.1 of http://tools.ietf.org/id/draft-ietf-6man-resilient-rs-03.txt  with the following:
>>
>> <text>
>> On multicast-capable links, the hosts following this specification
>> SHOULD stop retransmitting the RSs when Router Discovery is successful
>> (i.e. an RA with a non-zero Router Lifetime that results in a default
>> route is received).
>>
>> When and if all of the following conditions are met:
>> 1. The host received an RA with non-zero Router Lifetime since the
>> last attachment of the interface to the link, and has not received
>> later an RA with zero Router Lifetime from the same router.
>> 2. Less than N seconds (where N is randomly chosen by the host at the
>> interface initialization time to be IRT < N <= 2*IRT) remain till the
>> expiration of all of the known default routers on the interface.
> What is IRT?
>> the host MAY re-start sending of the RSs following the Proposed
>> algorithm, described in the section 2.
>> </text>
> This ensures that as long as there is no outage there would be no extra
> RS messages.
>
> However, if there is some outage all the hosts are likely to flood the
> network with RS messages at the same time.
> (For instance, if L2 is fine but some L3 breakage results in the router
> not getting the RAs out.)
>
> The "at the same time" part would be quite bad in a large network e.g.
> 10000 hosts on the link - whether WiFi or cellular.
> One could try to address that by adding a random delay, but that is a
> bit hard because the host doesn't know whether there are 10 or 10,000
> hosts on the link.
>
> Furthermore, the algorithm in section 2 is to send multicast RS. That is
> needed if the last router silently died and a replacement router (with a
> different MAC and IPv6 address) has been put in place instead. 10,000
> muticast RS as the same time seems bad.
>
> But if the issue is instead that the link isn't very good at delivering
> the multicast RAs and the router is fine, then a unicast RS would suffice.
>
> But I still don't understand (or I don't agree) with the problem you are
> trying to solve.
>
> Is the problem you want to solve a misconfigured router (which sends
> periodic RAs less frequently than the default router lifetime)?
> Is it links where multicast (RAs) are very unreliable, hence the host
> can miss the 3 that should appear before the default router lifetime
> expires?
> Routers which get replaced (without VRRP or HSRP) *and* do not send an
> initial set of multicast RAs?
>
> For links with unreliable (or no) multicast RAs there are much more
> robust ways to do this.
>
>      Erik
>
>> -Dmitry
>>
>>
>> -----Original Message-----
>> From: Erik Nordmark [mailto:nordmark@acm.org]
>> Sent: Thursday, August 14, 2014 4:41 PM
>> To: Dmitry Anipko; Ole Troan
>> Cc: <6man-chairs@tools.ietf.org>; draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG; Suresh Krishnan
>> Subject: Re: draft-ietf-6man-resilient-rs update
>>
>>
>> [Delay in responding due to travel and vacation]
>>
>> On 7/25/14, 5:21 PM, Dmitry Anipko wrote:
>>> Let's separate discussions on the substance and on what exact language we put in the draft.
>>>
>>> Substance wise - IMO, the change tries to reduce existing fragility of RD in some situations. We had very real real-world support cases with one of the top mobile operators in the US, where the network was not sending new RAs before expiring the info in the initial one. Yes, this was  a misconfiguration / bug on the network side. Yes, this is where it was eventually fixed, and yes perhaps the support teams should not have taken as much time to figure it out. But the end-users don't care about that part, and they would be better off in the meantime if the protocol had more resiliency against misconfiguration or packet loss. Speaking for myself only, this is the scenario I'm concerned about. I'm sure there are others who know the "no multicast RAs, period" scenario better than I do, and they can speak for that.
>> Dmitry,
>> I don't understand how we can make the protocol specification more robust against the above type of misconfiguration - at least not without resulting in a lot more chatty protocol.
>>
>> For instance, suppose we say that hosts should send unicast a RS every
>> 10 seconds. That would be quite chatty and would still result in the same failures if the operator accidentally configures the router (and/or
>> prefix) lifetime to be 5 seconds.
>> And having the hosts look at the remaining lifetimes and decide to unicast (or multicast) RSes closer and closer together as the
>> lifetime(s) get closer to expiry might result in an explosion of RS messages during a network outage.
>>
>>       Erik
>>
>>> So I think it would be good to first understand, leaving specific RFC language aside, whether there is consensus to improve RD reliability when the host sees that previously received information has expired / about to expire. My impression from the discussion in London was that there was such consensus - but perhaps I misunderstood. Letting the host do additional rate-limited transmissions to solve that, to me looks like a simple and reasonable approach. If you agree with that, then what's the specific concern on putting that as MAY?
>>>
>>> I do disagree with the viewpoint, that because if ND use of multicast is inefficient, let's not make any improvements to ND until multicast issue is solved. Yes, the multicast-related issues in general do need to be addressed over long term. But addressing them will take time, and there are issues today which end-users are hitting, for which incremental limited fixes within existing framework are possible and reasonable.
>>>
>>>>> today, links without periodic RAs are not supported.
>>> Since this is a statement about today, it equally applies (or not applies), to the option 2 you suggested and the option I suggested, so it's not a differentiator. Furthermore, it basically is equivalent to option 1, which makes it not an independent argument, but re-iterating option 1, and can't be used as a differentiator there either.
>>>
>>> -----Original Message-----
>>> From: Ole Troan [mailto:otroan@employees.org]
>>> Sent: Thursday, July 24, 2014 8:25 PM
>>> To: Dmitry Anipko
>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>
>>>>>> wouldn't those hosts initially do router discovery, then not ever send RS again, and would be left without a default router when the router lifetime expired?
>>>> Assuming there are no other events, such as media state change - correct, and that's the behavior they get today. As I understand it, this MAY is about a specific behavior improvement, compared to the today state, which a vendor may choose to implement.
>>> today, links without periodic RAs are not supported.
>>>
>>> cheers,
>>> Ole
>>>
>>>> -----Original Message-----
>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>> Sent: Wednesday, July 23, 2014 8:01 PM
>>>> To: Dmitry Anipko
>>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>
>>>> Dmitry,
>>>>
>>>>>>> what would happen with the hosts that didn't implement the feature on links w/o periodic multicast RAs?
>>>>> It would not get this particular improvement in behavior compared to the pre-draft state. But since the answer seems to be trivial, I'm not sure I understood the question correctly?
>>>> wouldn't those hosts initially do router discovery, then not ever send RS again, and would be left without a default router when the router lifetime expired?
>>>>
>>>> cheers,
>>>> Ole
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>>> Sent: Tuesday, July 22, 2014 7:15 PM
>>>>> To: Dmitry Anipko
>>>>> Cc: Suresh Krishnan; <6man-chairs@tools.ietf.org>;
>>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>>
>>>>> Dmitry,
>>>>>
>>>>>> Given the definition of MAY in RFC 2119, why is the wording, which is currently in the text, not an option?
>>>>>>
>>>>>> <quote>
>>>>>> In the
>>>>>> same vein an implementation which does include a particular option
>>>>>> MUST be prepared to interoperate with another implementation which
>>>>>> does not include the option (except, of course, for the feature the
>>>>>> option provides.) </quote>
>>>>>>
>>>>>> Where "the feature" would be e.g. "supporting links w/o periodic multicast RAs"?
>>>>> what would happen with the hosts that didn't implement the feature on links w/o periodic multicast RAs?
>>>>>
>>>>> cheers,
>>>>> Ole
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Ole Troan [mailto:otroan@employees.org]
>>>>>> Sent: Tuesday, July 22, 2014 2:10 PM
>>>>>> To: Suresh Krishnan
>>>>>> Cc: Dmitry Anipko; <6man-chairs@tools.ietf.org>;
>>>>>> draft-ietf-6man-resilient-rs@tools.ietf.org; 6man WG
>>>>>> Subject: Re: draft-ietf-6man-resilient-rs update
>>>>>>
>>>>>> Suresh, et al,
>>>>>>
>>>>>> to pick up this after my holiday.
>>>>>>
>>>>>>>>> My impression from the discussion in London that the consensus was closer to (2) than (1) - that on such links the hosts should be allowed to restart transmissions in a reasonable / rate-limited manner, when connectivity parameters expire. And that we should add MAY text for that.
>>>>>>>> that's also fine with me. if there was a text proposal it would most likely be easier for the working group to come to a decision.
>>>>>>>> is that something the authors can take on?
>>>>>>> In my view, the text in the draft already reflects something akin to what Dmitry was talking about (but not exactly option 2 from your original mail on this thread). This is the relevant text from the draft.
>>>>>>> Hosts MAY continue retransmitting the RSs even after router discovery is successful.  If the host continues to retransmit RSs, it is RECOMMENDED that such retransmissions be rate-limited to one every MRT.
>>>>>> the current text states: hosts SHOULD stop, but MAY continue.
>>>>>> on a link where there are no periodic multicast RA, then all the hosts that doesn't implement the MAY will fail after the initial Router lifetime seconds.
>>>>>>
>>>>>> to me it appears we have two options.
>>>>>> 1) only solve the initial connection case.
>>>>>> remove any mention of links without periodic RAs (remove the last sentence of the abstract, and bullet 1b.).
>>>>>>     possibly remove the MAY in the "SHOULD stop, but MAY continue"
>>>>>>     advance the document to the IESG
>>>>>> 2) take the document back to the WG and solve the problems that Erik outlined in his mail.
>>>>>>
>>>>>> cheers,
>>>>>> Ole
>>> --------------------------------------------------------------------
>>> IETF IPv6 working group mailing list
>>> ipv6@ietf.org
>>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>>> --------------------------------------------------------------------
>>>
>
> --------------------------------------------------------------------
> IETF IPv6 working group mailing list
> ipv6@ietf.org
> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
> --------------------------------------------------------------------
>