RE: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?

Vasilenko Eduard <vasilenko.eduard@huawei.com> Tue, 10 August 2021 13:24 UTC

Return-Path: <vasilenko.eduard@huawei.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9173F3A09F7; Tue, 10 Aug 2021 06:24:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.886
X-Spam-Level:
X-Spam-Status: No, score=-1.886 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Wj9ppEPaXzvu; Tue, 10 Aug 2021 06:24:13 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 45EF43A09F3; Tue, 10 Aug 2021 06:24:13 -0700 (PDT)
Received: from fraeml736-chm.china.huawei.com (unknown [172.18.147.201]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4GkYYk5Bwfz6BDg9; Tue, 10 Aug 2021 21:23:38 +0800 (CST)
Received: from msceml704-chm.china.huawei.com (10.219.141.143) by fraeml736-chm.china.huawei.com (10.206.15.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.8; Tue, 10 Aug 2021 15:24:08 +0200
Received: from msceml703-chm.china.huawei.com (10.219.141.161) by msceml704-chm.china.huawei.com (10.219.141.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 10 Aug 2021 16:24:07 +0300
Received: from msceml703-chm.china.huawei.com ([10.219.141.161]) by msceml703-chm.china.huawei.com ([10.219.141.161]) with mapi id 15.01.2176.012; Tue, 10 Aug 2021 16:24:07 +0300
From: Vasilenko Eduard <vasilenko.eduard@huawei.com>
To: Gyan Mishra <hayabusagsm@gmail.com>
CC: 6man WG <ipv6@ietf.org>, IETF discussion list <ietf@ietf.org>, Phillip Hallam-Baker <phill@hallambaker.com>, Theodore Ts'o <tytso@mit.edu>, Töma Gavrichenkov <ximaera@gmail.com>
Subject: RE: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
Thread-Topic: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
Thread-Index: AdeITEcEgFJ2cblRTf2jVtlvbXQCSf//1HuAgAAKBoCABbMDAIAAHJEAgADFrgCAAAXCAIAAJhUAgAAd7wCAAA2xgIAADq8AgAAI1wCAADD9AIAABIYAgAAUnoCAAFXlgIAAePGAgAAF3QCAAAEcAIAAAfiAgAB9UACAAMsxgIAAC5oAgABYuACAAAWiAIAAEPyAgAAEeICAAAiaAIAAE5SAgAB7Z4CAADMQgIAAOYIAgAAlvAD//8kjcIAAR1aA///MpdA=
Date: Tue, 10 Aug 2021 13:24:07 +0000
Message-ID: <16f0deadd95a4b84b422ecdb87864c9b@huawei.com>
References: <CALZ3u+aP=v_1=w1xqfEKof7Cc6Ba3pwOYV3O=0b=NxS4hRWhiA@mail.gmail.com> <YRBdZrKV+MrrhUCG@mit.edu> <CALZ3u+aBdE3Bw3_ry+CuV4tS016c4mWewJFpr0aCbBnwj70Vzg@mail.gmail.com> <a3833e04-c123-ef52-95f9-cae80a1390e7@foobar.org> <CAMm+LwiAbiK618+kY9JTLr7_mQd-E5TKyNsGqOLrGQoLzjJo=A@mail.gmail.com> <CALZ3u+bLVUZf1fTHQvAVzOnToiPcsXEyTNt56hNAXz4=-G5-6w@mail.gmail.com> <CAHw9_i+k9x1g3bcst6rHcXpesEVwnPtV6DzsFAxi8dC6CRMZPw@mail.gmail.com> <CALx6S346mqNaE+s1DH7S7RutTpzfrC5oX1No5Jb72sTvVQjtpQ@mail.gmail.com> <CAHw9_i+ELJS_xqcEHM4raq+f=PZ5yw1ptfG3a6VypZmWTo11-A@mail.gmail.com> <CAOj+MMGzWq1OrwBQW_Mz4gB+z9wJSdQnFCkTmWiHi_Tm3ty47g@mail.gmail.com> <YRHx4c8/nOh5aXN1@mit.edu> <CABNhwV1HdSrzHDLhuSMaWY+9UaHnFYaYo75fN3+JMgMnf+Pnhw@mail.gmail.com> <CALZ3u+Z4XYf0gLrhsA5D1pJz5O2Wn6fpBugh6LeTOkGb9Pn=7A@mail.gmail.com> <CABNhwV3rAjueGD_vKgoTc7egF9RzDXbswTibOZYb50da3H8Ljw@mail.gmail.com> <301d978d8c27427f954af79070fe5741@huawei.com> <CABNhwV3_X5OC1p-191r8scCX3yWDv_H0xGVDh0sUQfSzLWonUA@mail.gmail.com>
In-Reply-To: <CABNhwV3_X5OC1p-191r8scCX3yWDv_H0xGVDh0sUQfSzLWonUA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.47.197.89]
Content-Type: multipart/alternative; boundary="_000_16f0deadd95a4b84b422ecdb87864c9bhuaweicom_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/gX2zJQNaZVvD19e29nfASwKC4Ow>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Aug 2021 13:24:20 -0000

Hi Gyan,
I am not sure why Linux RTO refreshed the flow label initially in 2014-2016.
But the presentation on the last IETF that attracted my attention was about how to mitigate this type of vendor's bug: loss of sync between control plane and data plane. That leads to prolonging “silent drop”.

I have stated in the 1st message: it is a problem that this work-around has become the default for the whole Internet.
I agree that it should not happen by default after any number of RTOs.

But even after activation, it makes sense to give IGP a chance to repair the problem.
IMHO: RTO timer should be configurable or at least 1s to give OSPF a chance.
Because rerouting would happen even if IGP would fix the problem very soon, not only for “hung” PFE.
Eduard
From: Gyan Mishra [mailto:hayabusagsm@gmail.com]
Sent: Tuesday, August 10, 2021 4:09 PM
To: Vasilenko Eduard <vasilenko.eduard@huawei.com>
Cc: 6man WG <ipv6@ietf.org>; IETF discussion list <ietf@ietf.org>; Phillip Hallam-Baker <phill@hallambaker.com>; Theodore Ts'o <tytso@mit.edu>; Töma Gavrichenkov <ximaera@gmail.com>
Subject: Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?


Hi Eduard

On Tue, Aug 10, 2021 at 8:34 AM Vasilenko Eduard <vasilenko.eduard@huawei.com<mailto:vasilenko.eduard@huawei.com>> wrote:
It is probably too strong: “let’s kill the Linux hack all together”.
IMHO: it should be completely switched off by default. But if some admin would like to use it – let it activate it and use it.

    Gyan> Agreed
Cross all OSI model optimization looks not the good architecture decision, but if somebody wants to do it – why not.
 Gyan> Agreed
The original problem that was raised for this Linux feature (original use case):
many vendors already have so bad microcode that it is very often in the big DC environment to have broken PFE that the control plane is not aware of. “Silent drop” up to manual intervention.
Of course, it is better to monitor such a situation in a different way (iOAM, BFD), but if one has already hundreds or thousands of switches – it is not a short-term proposition. Faster work-around is needed.

     Gyan> I believe the  original problem with hashing reported not in DC environment but over the internet?  For the general internet scenario  not rehashing at all as the Default behavior is the best solution.  For the DC and other scenarios Linux developers can change as they see fit for their environment is fine.


Ed/
From: ietf [mailto:ietf-bounces@ietf.org<mailto:ietf-bounces@ietf.org>] On Behalf Of Gyan Mishra
Sent: Tuesday, August 10, 2021 3:10 PM
To: Töma Gavrichenkov <ximaera@gmail.com<mailto:ximaera@gmail.com>>
Cc: Theodore Ts'o <tytso@mit.edu<mailto:tytso@mit.edu>>; Phillip Hallam-Baker <phill@hallambaker.com<mailto:phill@hallambaker.com>>; 6man WG <ipv6@ietf.org<mailto:ipv6@ietf.org>>; IETF discussion list <ietf@ietf.org<mailto:ietf@ietf.org>>
Subject: Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?

Hi Töma

On Tue, Aug 10, 2021 at 5:55 AM Töma Gavrichenkov <ximaera@gmail.com<mailto:ximaera@gmail.com>> wrote:
Peace,
On Tue, Aug 10, 2021, 9:31 AM Gyan Mishra <hayabusagsm@gmail.com<mailto:hayabusagsm@gmail.com>> wrote:
a patch that makes default less aggressive by restoring the original default behavior to recompute hash only after multiple RTOs.

Let's now talk about hacks, right?

A flow is basically a stream of similar data within one or more connections.  This is an application layer concept.  Architecturally, it may change on a connection if the data flow within the connection changes.

   Gyan> Agreed

E.g. we've established a connection to [youtube DNS A entry]:443, downloaded the hypertext, but now we're going to reuse the same established connection to stream video, so the network should better treat that connection somehow differently now.

The flow label was never supposed to be a legitimate control over routing.  It shouldn't change over one, two, or a hundred RTOs.  It generally only changes when the flow becomes different.
I believe this was so obvious to the authors of the original specification in 2003 that they even forgot to actually state it.

    Gyan> Very Good point. So let’s say you have an IPv4 or IPv6 TCP Anycast connection you should stay on that proximity routed flow throughout the duration that goes for the long lived TCP.  But now with the Linux hack we now shift after the first RTO immediately to try a different BGP anycast path via Linux hack patch and hope for better results in case the first path was congested or having issues.  This is definitely an application based network engineering hack by a Linux developer whom had the best intentions of a application network awareness  self healing network.  From a technical standpoint as a TCP RST has already been receiving and we are re-establishing the connection, I am not understanding why this was such a bad thing understandable that it’s aggressive but the thought process does makes sense.  The Linux developers thought was that if you got an RTO, then more then likely that network path is bad and let’s rehash to a different path immediately.  I can see the down side is that first Anycast path from a BGP path selection was the best lowest latency path, but now the application thinks it understands the network better then network engineers and thinks it’s better to rehash to a different path immediately.  The MAJOR problem with that is as BGP Anycast is proximity based you could end up going half was around the world for the second best path and now voila —> TCP Anycast is now from the Happy Eyeballs (not the RFC 6555) but user perspective is completely broken thus  the subject heading “IPv6 Anycast has been killed by Linux patch”.

What Tom proposed is, of course, way better than how it works now.  Especially the socket option — yay, Linux is finally going to implement the "MUST" in RFC3697#3!  We harbour the hope that other operating systems would do the same good thing.

Gyan> Given what I stated above I would say let the Network do the networking and as CDN makes up 90% plus of the internet traffic being GEO load balanced worldwide, and as we have IETF ALTO WG that does application based traffic optimization BGP-LS / PCEP CDN  RSVP / SR aware network optimization based solutions  that  already exist today,  let’s kill the Linux hack all together.  As the Linux server is completely unaware of network conditions any rehash is bad thing as that breaks TCP Anycast by sending you clear around the world when you should be “sticky” based on BGP Anycast best path selection stay on the optimization proximity based network path and only shift to alternate BGP path when the path is no longer available.  Let routing do it’s routing!!

But the idea I'm trying to drive home is: fixing (temporary) network delivery issues via the control of a strictly application level feature is among the dirtiest of the hacks possible.

  Gyan> Firmly agreed

And it kind of amazes me how people call anycast a hack (while it's perfectly the behaviour natural to the Internet, a global self-healing internetwork, as designed in 1970s) and still consider this a legitimate behaviour.

Gyan> Agreed

Gyan> After reading the  feedback from Toma can we not rehash at all for the Default Linux patch.  See the MAJOR problem that is being created when you try to rehash with BGP Anycast described above and that basically any rehash literally breaks IPv6 flow label based TCP Anycast CDN load balancing.

--
Töma
--

[http://ss7.vzw.com/is/image/VerizonWireless/vz-logo-email]<http://www.verizon.com/>

Gyan Mishra

Network Solutions Architect

Email gyan.s.mishra@verizon.com<mailto:gyan.s.mishra@verizon.com>

M 301 502-1347

--

[http://ss7.vzw.com/is/image/VerizonWireless/vz-logo-email]<http://www.verizon.com/>

Gyan Mishra

Network Solutions Architect

Email gyan.s.mishra@verizon.com<mailto:gyan.s.mishra@verizon.com>

M 301 502-1347