Re: [GROW] Handling of LAGs in Mitigating Negative Impact of Maintenance through BGP Session Culling

Joel Jaeggli <joelja@bogus.com> Tue, 09 January 2018 16:33 UTC

Return-Path: <joelja@bogus.com>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4906312D77B; Tue, 9 Jan 2018 08:33:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.909
X-Spam-Level:
X-Spam-Status: No, score=-6.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zcsuCgh2qSmH; Tue, 9 Jan 2018 08:33:17 -0800 (PST)
Received: from nagasaki.bogus.com (nagasaki.bogus.com [IPv6:2001:418:1::81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7A2CE129516; Tue, 9 Jan 2018 08:33:17 -0800 (PST)
Received: from [192.168.0.104] (c-73-202-177-209.hsd1.ca.comcast.net [73.202.177.209]) (authenticated bits=0) by nagasaki.bogus.com (8.15.2/8.15.2) with ESMTPSA id w09GXCkV041340 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 9 Jan 2018 16:33:13 GMT (envelope-from joelja@bogus.com)
X-Authentication-Warning: nagasaki.bogus.com: Host c-73-202-177-209.hsd1.ca.comcast.net [73.202.177.209] claimed to be [192.168.0.104]
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (1.0)
From: Joel Jaeggli <joelja@bogus.com>
X-Mailer: iPhone Mail (15C202)
In-Reply-To: <20180109155039.GD59807@vurt.meerval.net>
Date: Tue, 09 Jan 2018 08:33:07 -0800
Cc: Will Hargrave <will@harg.net>, grow@ietf.org, draft-ietf-grow-bgp-session-culling@ietf.org
Content-Transfer-Encoding: quoted-printable
Message-Id: <C87317FA-4598-4E73-A635-902A30B8B68D@bogus.com>
References: <8BB20DB3-61E9-4CAC-B33B-B18CA12C2591@de-cix.net> <20180109113506.GA99435@vurt.meerval.net> <53E19D26-D4C0-4722-8CFE-FDC5BF5C3FBC@harg.net> <20180109155039.GD59807@vurt.meerval.net>
To: Job Snijders <job@instituut.net>
Archived-At: <https://mailarchive.ietf.org/arch/msg/grow/VGcMqSpYfYIf7JRl3O8r7-fkLF4>
Subject: Re: [GROW] Handling of LAGs in Mitigating Negative Impact of Maintenance through BGP Session Culling
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/grow/>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jan 2018 16:33:19 -0000


Sent from my iPhone

> On Jan 9, 2018, at 07:50, Job Snijders <job@instituut.net> wrote:
> 
>> On Tue, Jan 09, 2018 at 03:34:46PM +0000, Will Hargrave wrote:
>> On 9 Jan 2018, at 11:35, Job Snijders wrote:
>>>> Our suggestion for handling LAGs looks like this: Typically, a
>>>> minimum number of port members can be defined for a LAG being up.
>>>> The LAG is not touched by BGP Session Culling during a maintenance
>>>> unless this number is undercut. If the number if undercut the LAG
>>>> is handled by BGP Session Culling as defined in the Internet
>>>> Draft.

This sounds like an implementation specific detail to me. I can think of several variants on  selecting which bgp sessions to cull based on criterion that might be exchange specific. Such as when inter-switch/site bandwidth drops below a certain threshold cull peers accordingly. 

I’m not sure that I would recommend any of these however the specifics of a service offering are up to an operator.

>>>> 
>>>> If no value for the minimum number of active port members is
>>>> defined for a LAG, the value 1 should be used as this is the
>>>> behaviour of LAGs today already.
>>> 
>>> Is this in context of multi-chassis LAG?
>> 
>> I think if we include anything about LAGs we should make it very clear
>> that you must apply the culling ACL to *either* all ports of a LAG
>> *or* none.  Applying it to half of an MCLAG could be disastrous.
> 
> Will, my reading of Thomas' message is slightly different, I don't think
> he is proposing to apply culling ACLs on half the ports of a LAG, he
> proposes that culling ACLs are only applied (to all ports) when more
> than X members of a LAG will be impacted by the maintenance (where X is
> an ixp-participant configurable number).

The same approach applies across multiple line cards in a large switch, or indeed might involve the loss of ports to which the client is not directly connected.

> 
>> I didn’t realise there were IXPs using MC-LAG. Discovering this maybe
>> surprise some members.
> 
> Thomas, in terms of IETF logistics, the RFC-To-Be draft-ietf-grow-bgp-session-culling
> document has already been submitted to the RFC Editor and is in the
> queue, information on LAGs cannot be added at this point to the
> publication.
> 
> However, since this is a BCP, the next iteration could include
> additional information as our understanding of the culling practise
> improves, and the BCP number wouldn't change of course.
> 
>> From what I read in your message your organisation is still in the early
> phases of applying the 'culling' mechanism. I recommend you to (over
> time) carefully take notes of the interaction between LAG and culling,
> and whatever arises as the best current practise is documented in the
> next revision of the BCP.
> 
> Kind regards,
> 
> Job
> 
> _______________________________________________
> GROW mailing list
> GROW@ietf.org
> https://www.ietf.org/mailman/listinfo/grow
>