Re: Can a BFD session change its source port to facilitate auto recovery

Greg Mirsky <gregimirsky@gmail.com> Thu, 23 March 2023 20:08 UTC

Return-Path: <gregimirsky@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0DA2C13AE31 for <rtg-bfd@ietfa.amsl.com>; Thu, 23 Mar 2023 13:08:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.097
X-Spam-Level:
X-Spam-Status: No, score=-7.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yY1bTfn6bk1X for <rtg-bfd@ietfa.amsl.com>; Thu, 23 Mar 2023 13:08:45 -0700 (PDT)
Received: from mail-yb1-xb2a.google.com (mail-yb1-xb2a.google.com [IPv6:2607:f8b0:4864:20::b2a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3AB17C13AE4C for <rtg-bfd@ietf.org>; Thu, 23 Mar 2023 13:08:45 -0700 (PDT)
Received: by mail-yb1-xb2a.google.com with SMTP id n125so26247778ybg.7 for <rtg-bfd@ietf.org>; Thu, 23 Mar 2023 13:08:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679602124; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=U9lRzQmypCkzLB29pKDIwpopVn5UO+c7a2jvLJYDNCU=; b=aA8C17ENcthurXPxefvlzgvBkXsUoMchEeifdgQJO8HrlSnwihAY27neNbm5siNZoK Nh3v63cOTdyYHUncm5mt4UCxp7RAk5fAaAz0OsUplBQn11M7V8LrqgXth7RTNSrBB/In kjC6II84Of+DYx+hwcCN7L9GeDsHufyirK72yR0irAm0BUi0A/69q1J686UVIVuZ5muv gsF1mP3Y9h5xxa0ZOPDsDabmP7FP2A4pnbQ84nsTL+hEYwrqCEs8ooXqEC6butEwdDcf zEBg13QdD2WJZIZhh3fgbNcSFvoEhw0+jYCN0Iv27NffxbEYzsN/kj77+eO/2gQjo0hZ WimA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679602124; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=U9lRzQmypCkzLB29pKDIwpopVn5UO+c7a2jvLJYDNCU=; b=2Jd8KTwglo/j7kcvzSRgi82bzPMULGznP8HhlVWgGNGd6479ZHQFKHjJ++unxYu4tH c/82ICX/LZpKPfYq9ZVDp4NECQWo0UJpQ7qs476Nm7Q9GgUIQrV+EoI4CZJoXUi2gowT 0sj8zCwn6lqm2kboX3CQ1A112OWnNd/3f7Eut/9COKnRpJiOmwv//mji0CFc5WvFH+eP watgtBUbLk2QImkmP9E3O+vxrXw2YAuBk2/reNtjbUrbBd2X0r7xyvBwfeX/TYXB2yz9 2vhoodKQ2RzFHza5tP7CIFgvTrqxEwCRLxgzllEPTVOPZyqy80errA/N2Ld51ZyrSByo SMsg==
X-Gm-Message-State: AAQBX9d29CVqtdk3qblD2LhcbGXDzV2pBbcs4vdrVUEc1RPz6n93mnhB zzJlQc4qGvdfoI3vzniPpKWdKeq3bVEXEii+s1T1r//U
X-Google-Smtp-Source: AKy350ayNNdpklGFqATx6R+uAUHMVa6+hUTlVMy/43vMikAflq2fjn+DcTYrBgCaShZ6CeETv0dEzxQGKF3A8Zg/shk=
X-Received: by 2002:a25:b001:0:b0:b70:ad30:dacc with SMTP id q1-20020a25b001000000b00b70ad30daccmr2441534ybf.2.1679602124268; Thu, 23 Mar 2023 13:08:44 -0700 (PDT)
MIME-Version: 1.0
References: <CAL9v8R2iYMGjxF-A9SuDMcu2EF6h0isquTxjuAtNdqFwv_6etg@mail.gmail.com> <6DE166F3-5E02-446B-A105-0C6E2CC4E448@gmail.com> <CAL9v8R3siKGhK_gwWH9COFDRwb1-LYukHe1JxYOyyC5=3mxetw@mail.gmail.com>
In-Reply-To: <CAL9v8R3siKGhK_gwWH9COFDRwb1-LYukHe1JxYOyyC5=3mxetw@mail.gmail.com>
From: Greg Mirsky <gregimirsky@gmail.com>
Date: Thu, 23 Mar 2023 13:08:32 -0700
Message-ID: <CA+RyBmXboPmBpj3qOY7MSx0PuGikqsf9sahON1b6K7LuwLvr5A@mail.gmail.com>
Subject: Re: Can a BFD session change its source port to facilitate auto recovery
To: Abhinav Srivastava <absrivas@gmail.com>
Cc: Jeff Tantsura <jefftant.ietf@gmail.com>, rtg-bfd@ietf.org
Content-Type: multipart/alternative; boundary="000000000000fb30be05f796d718"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/ut4AGA7BCwYyFnOMWycNEv7ZQms>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Mar 2023 20:08:46 -0000

Hi Abhinav,
although BFD is not expected to detect network performance issues like
congestion, some might be reflected as a network failure if an aggressive
Detection Time is used on the particular BFD session. On the other hand,
using aggressive detection intervals on an MH BFD session may not be the
best operational practice. Instead, as was suggested, SH and LAG BFD
sessions can be run with more aggressive detection intervals, while the MH
BFD session is set with intervals that meet the operator's expectations and
are more relaxed. In fact, such an arrangement can be viewed as a case of a
multi-layer OAM in a domain. And yes, selecting interval values for that
case is, in part, "black magic".

Regards,
Greg

On Thu, Mar 23, 2023 at 7:18 AM Abhinav Srivastava <absrivas@gmail.com>
wrote:

> The case I had in mind is where multi hop BFD is being used to monitor
> availability of remote servers.  there are many equal cost paths to reach
> them especially in a DC.  BFD detecting network issues is only incidental
> there. And even if it recovers it can leave monitoring/alerting trail . If
> it's happening often would/should not be ignored.
>
> I take your point about most applications only experiencing latency
> without dropping tcp connection. I guess BFD in that case is helping them
> get disconnected (eg directly associated protocols like BGP or causing a
> load balancer in path to direct packet to wrong server). Though continuous
> flapping is the flip side.
>
> Thanks
> Abhinav
>
>
> On Wed, 22 Mar, 2023, 11:27 pm Jeff Tantsura, <jefftant.ietf@gmail.com>
> wrote:
>
>> Abhinav,
>>
>> Let’s clarify a couple of points.
>> What you are trying to do is to change entropy to change local hashing
>> outcome, however for hashing to even be relevant there has to he either
>> ECMP or LAG in the path to the destination otherwise shortest path will be
>> he used regardless, so statistically, some of the flows between a given
>> pair of end points (5 tuple) will be traversing the (partially)broken link,
>> would you really like BFD to “pretend“ that everything is just fine?
>> Moreover, by far, in case of congestion  - most applications won’t change
>> their ports but have their TX rate reduced.
>> There’s work done by Tom Herbert for IPv6/TCP (kernel patch upstreamed a
>> few years ago)  - had beeb presented in RTGWG pre-Covid, that on RTO
>> changes flow label value (that some might or might not include in hashing),
>> which is strongly not recommended to be used outside of a tightly
>> controlled homogenous  environment (think within DC).
>> Outside of what BFD spec tells us (don’t), the above should provide
>> enough motivation not to do this.
>>
>> Cheers,
>> Jeff
>>
>> On Mar 23, 2023, at 05:44, Abhinav Srivastava <absrivas@gmail.com> wrote:
>>
>> 
>> Multi-hop BFD would be the mechanism that detects the failure on the path
>> it happens to be using for the session. I wasn't thinking of another
>> mechanism.  Detection timer expiry would be the trigger for recovery which
>> could be augmented with few other possible criteria like how long session
>> hasn't been able to come back up or prolonged flapping.
>>
>> Thanks
>> Abhinav
>>
>> On Wed, 22 Mar, 2023, 3:05 pm Greg Mirsky, <gregimirsky@gmail.com> wrote:
>>
>>> Hi Abhinav,
>>> thank you for presenting an interesting scenario for a discussion. I
>>> have several questions to better understand it:
>>>
>>>    - How the network failure that triggers the recovery process is
>>>    detected?
>>>    - If the failure detection mechanism is not multi-hop BFD, what is
>>>    the relationship between the detection intervals of heat mechanism and the
>>>    multi-hop BFD session?
>>>
>>> Regards,
>>> Greg
>>>
>>> On Wed, Mar 22, 2023 at 4:36 PM Abhinav Srivastava <absrivas@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>>
>>>>
>>>> I needed clarification around whether source port can be changed for a
>>>> BFD session in case of multi hop BFD.   The ability to change BFD source
>>>> port when BFD session goes down helps BFD session to recover if its stuck
>>>> on a network path where there is some intermittent but significant packet
>>>> loss.
>>>>
>>>>
>>>>
>>>> In such cases, normally without BFD, end to end application traffic
>>>> would eventually settle down on a good path as applications typically
>>>> change source port after experiencing disconnection or failures.  But if
>>>> BFD is being used to monitor some part of a path which is experiencing
>>>> significant but not 100% packet loss, it will start causing next hop list
>>>> of associated static route or the associated BGP sessions to start flapping
>>>> forever, as BFD packets would be stuck to that partial lossy path forever
>>>> (until BFD session is deleted and recreated by admin action).  This may
>>>> also hinder the typical application recovery strategy of changing source
>>>> port on failure.
>>>>
>>>>
>>>>
>>>> Ability to dynamically change BFD source port can help BFD recover in
>>>> such cases.  Is this something that is allowed as per RFC?  The RFC5881,
>>>> section 4 (for single hop) case states that –
>>>>
>>>> *“The source port MUST be in the range 49152 through 65535. The same
>>>> UDP source port number MUST be used for all BFD Control packets associated
>>>> with a particular session”*
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Abhinav
>>>>
>>>