Re: Can a BFD session change its source port to facilitate auto recovery

Abhinav Srivastava <absrivas@gmail.com> Thu, 23 March 2023 14:19 UTC

Return-Path: <absrivas@gmail.com>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0237BC1522B9 for <rtg-bfd@ietfa.amsl.com>; Thu, 23 Mar 2023 07:19:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.097
X-Spam-Level:
X-Spam-Status: No, score=-7.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8f8RG_NysVCn for <rtg-bfd@ietfa.amsl.com>; Thu, 23 Mar 2023 07:19:00 -0700 (PDT)
Received: from mail-ed1-x52a.google.com (mail-ed1-x52a.google.com [IPv6:2a00:1450:4864:20::52a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A5FE4C14F721 for <rtg-bfd@ietf.org>; Thu, 23 Mar 2023 07:19:00 -0700 (PDT)
Received: by mail-ed1-x52a.google.com with SMTP id y4so87346481edo.2 for <rtg-bfd@ietf.org>; Thu, 23 Mar 2023 07:19:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679581139; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0DUQTSlHCgYfaqaeX5okvPbZKy1lvgTur78h9zGFipc=; b=euQPYejOfN2LJ3Q0H3LBHoUlW+AzlF/JgS2NdWSCuddnfDhz2H/f2U+FBiYY5O9ABY pY2OPmsUW8tgYIyBaBMADMZKpO8y6uQFd41ScQTnqhxXTkL6q9a3YRbssnzL6S0jv6TV PGIOS02CVpph5oGPEgnvcoYrBpIqk2oQ0WyeT2ESDniUWqJeW+41V7hCGSUfqmvkilcE KkyQlr3GiCK4LNN1QSOenVb+KNc+GVXPmgdRSyCDuO29Cwf3ISu/cqxUde4dn+6BeDNP AifpIifrmt3uILm56sfkFfWufYrUQGGDRPzbclrtPfD6TxGCdRTkzJMsdvU/jOgRJppY AM8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679581139; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0DUQTSlHCgYfaqaeX5okvPbZKy1lvgTur78h9zGFipc=; b=2r7PiKJM2YlkjuYwpQFH38LLoq8R6VJY3I6//34M2+tkQUVybPtCzeYmc8UeULv3Br fFBW5zHpep9+p3uvbzdVfKW1CtgNxp6TEcC+Yz6+5eq6rpiig2718Q4dMO2L381tbdLb ozUizAw7pILAoobehDc9NMfwrpbJOBleukwuDsIdirv5nnwL8xqUlJxazhod0pIsEq8/ 43iH8P612+U82eVwEZnxfcZfeg1Ip6GxsafONGcVFOlRa3ROP+Vzd+Jvv0fURzUnvpBC 50IhLzgUd3LZe/BOSkKapJmvfzE5k3CDO1DKe2Tq9lcAz8chWbwaaRaHc4IqzMroLJ5k WGFA==
X-Gm-Message-State: AO0yUKVdKDadw3UewemfV7qEff4Fk/MWzhfrCXUgHHjgHsUvLlY5KlY2 R6P1hDyB44COxr0iLtPFTkS4k12/bEdsn2Z9QUGc3VLu
X-Google-Smtp-Source: AK7set/yD67kB46Vudw+oeQWoDQb+W6uA90CdwUxUGuoRltt43AN7Pcmnc7uo8EOiTdRcyhYTQjWnY3i5DBxJd0fGx4=
X-Received: by 2002:a17:906:5584:b0:92f:cbfe:1635 with SMTP id y4-20020a170906558400b0092fcbfe1635mr5272653ejp.6.1679581139066; Thu, 23 Mar 2023 07:18:59 -0700 (PDT)
MIME-Version: 1.0
References: <CAL9v8R2iYMGjxF-A9SuDMcu2EF6h0isquTxjuAtNdqFwv_6etg@mail.gmail.com> <6DE166F3-5E02-446B-A105-0C6E2CC4E448@gmail.com>
In-Reply-To: <6DE166F3-5E02-446B-A105-0C6E2CC4E448@gmail.com>
From: Abhinav Srivastava <absrivas@gmail.com>
Date: Thu, 23 Mar 2023 07:18:48 -0700
Message-ID: <CAL9v8R3siKGhK_gwWH9COFDRwb1-LYukHe1JxYOyyC5=3mxetw@mail.gmail.com>
Subject: Re: Can a BFD session change its source port to facilitate auto recovery
To: Jeff Tantsura <jefftant.ietf@gmail.com>
Cc: Greg Mirsky <gregimirsky@gmail.com>, rtg-bfd@ietf.org
Content-Type: multipart/alternative; boundary="0000000000002a650505f791f516"
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-bfd/P_zLH5g-98wHvMVGaFbufxXm-Yk>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Mar 2023 14:19:03 -0000

The case I had in mind is where multi hop BFD is being used to monitor
availability of remote servers.  there are many equal cost paths to reach
them especially in a DC.  BFD detecting network issues is only incidental
there. And even if it recovers it can leave monitoring/alerting trail . If
it's happening often would/should not be ignored.

I take your point about most applications only experiencing latency without
dropping tcp connection. I guess BFD in that case is helping them get
disconnected (eg directly associated protocols like BGP or causing a load
balancer in path to direct packet to wrong server). Though continuous
flapping is the flip side.

Thanks
Abhinav


On Wed, 22 Mar, 2023, 11:27 pm Jeff Tantsura, <jefftant.ietf@gmail.com>
wrote:

> Abhinav,
>
> Let’s clarify a couple of points.
> What you are trying to do is to change entropy to change local hashing
> outcome, however for hashing to even be relevant there has to he either
> ECMP or LAG in the path to the destination otherwise shortest path will be
> he used regardless, so statistically, some of the flows between a given
> pair of end points (5 tuple) will be traversing the (partially)broken link,
> would you really like BFD to “pretend“ that everything is just fine?
> Moreover, by far, in case of congestion  - most applications won’t change
> their ports but have their TX rate reduced.
> There’s work done by Tom Herbert for IPv6/TCP (kernel patch upstreamed a
> few years ago)  - had beeb presented in RTGWG pre-Covid, that on RTO
> changes flow label value (that some might or might not include in hashing),
> which is strongly not recommended to be used outside of a tightly
> controlled homogenous  environment (think within DC).
> Outside of what BFD spec tells us (don’t), the above should provide enough
> motivation not to do this.
>
> Cheers,
> Jeff
>
> On Mar 23, 2023, at 05:44, Abhinav Srivastava <absrivas@gmail.com> wrote:
>
> 
> Multi-hop BFD would be the mechanism that detects the failure on the path
> it happens to be using for the session. I wasn't thinking of another
> mechanism.  Detection timer expiry would be the trigger for recovery which
> could be augmented with few other possible criteria like how long session
> hasn't been able to come back up or prolonged flapping.
>
> Thanks
> Abhinav
>
> On Wed, 22 Mar, 2023, 3:05 pm Greg Mirsky, <gregimirsky@gmail.com> wrote:
>
>> Hi Abhinav,
>> thank you for presenting an interesting scenario for a discussion. I have
>> several questions to better understand it:
>>
>>    - How the network failure that triggers the recovery process is
>>    detected?
>>    - If the failure detection mechanism is not multi-hop BFD, what is
>>    the relationship between the detection intervals of heat mechanism and the
>>    multi-hop BFD session?
>>
>> Regards,
>> Greg
>>
>> On Wed, Mar 22, 2023 at 4:36 PM Abhinav Srivastava <absrivas@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>>
>>>
>>> I needed clarification around whether source port can be changed for a
>>> BFD session in case of multi hop BFD.   The ability to change BFD source
>>> port when BFD session goes down helps BFD session to recover if its stuck
>>> on a network path where there is some intermittent but significant packet
>>> loss.
>>>
>>>
>>>
>>> In such cases, normally without BFD, end to end application traffic
>>> would eventually settle down on a good path as applications typically
>>> change source port after experiencing disconnection or failures.  But if
>>> BFD is being used to monitor some part of a path which is experiencing
>>> significant but not 100% packet loss, it will start causing next hop list
>>> of associated static route or the associated BGP sessions to start flapping
>>> forever, as BFD packets would be stuck to that partial lossy path forever
>>> (until BFD session is deleted and recreated by admin action).  This may
>>> also hinder the typical application recovery strategy of changing source
>>> port on failure.
>>>
>>>
>>>
>>> Ability to dynamically change BFD source port can help BFD recover in
>>> such cases.  Is this something that is allowed as per RFC?  The RFC5881,
>>> section 4 (for single hop) case states that –
>>>
>>> *“The source port MUST be in the range 49152 through 65535. The same UDP
>>> source port number MUST be used for all BFD Control packets associated with
>>> a particular session”*
>>>
>>>
>>>
>>> Thanks
>>>
>>> Abhinav
>>>
>>