RE: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?

Vasilenko Eduard <vasilenko.eduard@huawei.com> Tue, 10 August 2021 12:35 UTC

Return-Path: <vasilenko.eduard@huawei.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A32AB3A07BA; Tue, 10 Aug 2021 05:35:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.886
X-Spam-Level:
X-Spam-Status: No, score=-1.886 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LgD851upBXzo; Tue, 10 Aug 2021 05:34:55 -0700 (PDT)
Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A705B3A07B7; Tue, 10 Aug 2021 05:34:54 -0700 (PDT)
Received: from fraeml704-chm.china.huawei.com (unknown [172.18.147.201]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4GkXSl4Pgzz6DKkg; Tue, 10 Aug 2021 20:34:15 +0800 (CST)
Received: from msceml703-chm.china.huawei.com (10.219.141.161) by fraeml704-chm.china.huawei.com (10.206.15.53) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2308.8; Tue, 10 Aug 2021 14:34:47 +0200
Received: from msceml703-chm.china.huawei.com (10.219.141.161) by msceml703-chm.china.huawei.com (10.219.141.161) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 10 Aug 2021 15:34:45 +0300
Received: from msceml703-chm.china.huawei.com ([10.219.141.161]) by msceml703-chm.china.huawei.com ([10.219.141.161]) with mapi id 15.01.2176.012; Tue, 10 Aug 2021 15:34:45 +0300
From: Vasilenko Eduard <vasilenko.eduard@huawei.com>
To: Gyan Mishra <hayabusagsm@gmail.com>, Töma Gavrichenkov <ximaera@gmail.com>
CC: Theodore Ts'o <tytso@mit.edu>, Phillip Hallam-Baker <phill@hallambaker.com>, 6man WG <ipv6@ietf.org>, IETF discussion list <ietf@ietf.org>
Subject: RE: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
Thread-Topic: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
Thread-Index: AdeITEcEgFJ2cblRTf2jVtlvbXQCSf//1HuAgAAKBoCABbMDAIAAHJEAgADFrgCAAAXCAIAAJhUAgAAd7wCAAA2xgIAADq8AgAAI1wCAADD9AIAABIYAgAAUnoCAAFXlgIAAePGAgAAF3QCAAAEcAIAAAfiAgAB9UACAAMsxgIAAC5oAgABYuACAAAWiAIAAEPyAgAAEeICAAAiaAIAAE5SAgAB7Z4CAADMQgIAAOYIAgAAlvAD//8kjcA==
Date: Tue, 10 Aug 2021 12:34:45 +0000
Message-ID: <301d978d8c27427f954af79070fe5741@huawei.com>
References: <CALZ3u+aP=v_1=w1xqfEKof7Cc6Ba3pwOYV3O=0b=NxS4hRWhiA@mail.gmail.com> <YRBdZrKV+MrrhUCG@mit.edu> <CALZ3u+aBdE3Bw3_ry+CuV4tS016c4mWewJFpr0aCbBnwj70Vzg@mail.gmail.com> <a3833e04-c123-ef52-95f9-cae80a1390e7@foobar.org> <CAMm+LwiAbiK618+kY9JTLr7_mQd-E5TKyNsGqOLrGQoLzjJo=A@mail.gmail.com> <CALZ3u+bLVUZf1fTHQvAVzOnToiPcsXEyTNt56hNAXz4=-G5-6w@mail.gmail.com> <CAHw9_i+k9x1g3bcst6rHcXpesEVwnPtV6DzsFAxi8dC6CRMZPw@mail.gmail.com> <CALx6S346mqNaE+s1DH7S7RutTpzfrC5oX1No5Jb72sTvVQjtpQ@mail.gmail.com> <CAHw9_i+ELJS_xqcEHM4raq+f=PZ5yw1ptfG3a6VypZmWTo11-A@mail.gmail.com> <CAOj+MMGzWq1OrwBQW_Mz4gB+z9wJSdQnFCkTmWiHi_Tm3ty47g@mail.gmail.com> <YRHx4c8/nOh5aXN1@mit.edu> <CABNhwV1HdSrzHDLhuSMaWY+9UaHnFYaYo75fN3+JMgMnf+Pnhw@mail.gmail.com> <CALZ3u+Z4XYf0gLrhsA5D1pJz5O2Wn6fpBugh6LeTOkGb9Pn=7A@mail.gmail.com> <CABNhwV3rAjueGD_vKgoTc7egF9RzDXbswTibOZYb50da3H8Ljw@mail.gmail.com>
In-Reply-To: <CABNhwV3rAjueGD_vKgoTc7egF9RzDXbswTibOZYb50da3H8Ljw@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.47.197.89]
Content-Type: multipart/alternative; boundary="_000_301d978d8c27427f954af79070fe5741huaweicom_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/s04HRNXqAQJ_Pda7qo9_znLVugA>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Aug 2021 12:35:01 -0000

It is probably too strong: “let’s kill the Linux hack all together”.
IMHO: it should be completely switched off by default. But if some admin would like to use it – let it activate it and use it.
Cross all OSI model optimization looks not the good architecture decision, but if somebody wants to do it – why not.

The original problem that was raised for this Linux feature (original use case):
many vendors already have so bad microcode that it is very often in the big DC environment to have broken PFE that the control plane is not aware of. “Silent drop” up to manual intervention.
Of course, it is better to monitor such a situation in a different way (iOAM, BFD), but if one has already hundreds or thousands of switches – it is not a short-term proposition. Faster work-around is needed.
Ed/
From: ietf [mailto:ietf-bounces@ietf.org] On Behalf Of Gyan Mishra
Sent: Tuesday, August 10, 2021 3:10 PM
To: Töma Gavrichenkov <ximaera@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>; Phillip Hallam-Baker <phill@hallambaker.com>; 6man WG <ipv6@ietf.org>; IETF discussion list <ietf@ietf.org>
Subject: Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?

Hi Töma

On Tue, Aug 10, 2021 at 5:55 AM Töma Gavrichenkov <ximaera@gmail.com<mailto:ximaera@gmail.com>> wrote:
Peace,
On Tue, Aug 10, 2021, 9:31 AM Gyan Mishra <hayabusagsm@gmail.com<mailto:hayabusagsm@gmail.com>> wrote:
a patch that makes default less aggressive by restoring the original default behavior to recompute hash only after multiple RTOs.

Let's now talk about hacks, right?

A flow is basically a stream of similar data within one or more connections.  This is an application layer concept.  Architecturally, it may change on a connection if the data flow within the connection changes.

   Gyan> Agreed

E.g. we've established a connection to [youtube DNS A entry]:443, downloaded the hypertext, but now we're going to reuse the same established connection to stream video, so the network should better treat that connection somehow differently now.

The flow label was never supposed to be a legitimate control over routing.  It shouldn't change over one, two, or a hundred RTOs.  It generally only changes when the flow becomes different.
I believe this was so obvious to the authors of the original specification in 2003 that they even forgot to actually state it.

    Gyan> Very Good point. So let’s say you have an IPv4 or IPv6 TCP Anycast connection you should stay on that proximity routed flow throughout the duration that goes for the long lived TCP.  But now with the Linux hack we now shift after the first RTO immediately to try a different BGP anycast path via Linux hack patch and hope for better results in case the first path was congested or having issues.  This is definitely an application based network engineering hack by a Linux developer whom had the best intentions of a application network awareness  self healing network.  From a technical standpoint as a TCP RST has already been receiving and we are re-establishing the connection, I am not understanding why this was such a bad thing understandable that it’s aggressive but the thought process does makes sense.  The Linux developers thought was that if you got an RTO, then more then likely that network path is bad and let’s rehash to a different path immediately.  I can see the down side is that first Anycast path from a BGP path selection was the best lowest latency path, but now the application thinks it understands the network better then network engineers and thinks it’s better to rehash to a different path immediately.  The MAJOR problem with that is as BGP Anycast is proximity based you could end up going half was around the world for the second best path and now voila —> TCP Anycast is now from the Happy Eyeballs (not the RFC 6555) but user perspective is completely broken thus  the subject heading “IPv6 Anycast has been killed by Linux patch”.

What Tom proposed is, of course, way better than how it works now.  Especially the socket option — yay, Linux is finally going to implement the "MUST" in RFC3697#3!  We harbour the hope that other operating systems would do the same good thing.

Gyan> Given what I stated above I would say let the Network do the networking and as CDN makes up 90% plus of the internet traffic being GEO load balanced worldwide, and as we have IETF ALTO WG that does application based traffic optimization BGP-LS / PCEP CDN  RSVP / SR aware network optimization based solutions  that  already exist today,  let’s kill the Linux hack all together.  As the Linux server is completely unaware of network conditions any rehash is bad thing as that breaks TCP Anycast by sending you clear around the world when you should be “sticky” based on BGP Anycast best path selection stay on the optimization proximity based network path and only shift to alternate BGP path when the path is no longer available.  Let routing do it’s routing!!

But the idea I'm trying to drive home is: fixing (temporary) network delivery issues via the control of a strictly application level feature is among the dirtiest of the hacks possible.

  Gyan> Firmly agreed

And it kind of amazes me how people call anycast a hack (while it's perfectly the behaviour natural to the Internet, a global self-healing internetwork, as designed in 1970s) and still consider this a legitimate behaviour.

Gyan> Agreed

Gyan> After reading the  feedback from Toma can we not rehash at all for the Default Linux patch.  See the MAJOR problem that is being created when you try to rehash with BGP Anycast described above and that basically any rehash literally breaks IPv6 flow label based TCP Anycast CDN load balancing.

--
Töma
--

[http://ss7.vzw.com/is/image/VerizonWireless/vz-logo-email]<http://www.verizon.com/>

Gyan Mishra

Network Solutions Architect

Email gyan.s.mishra@verizon.com<mailto:gyan.s.mishra@verizon.com>

M 301 502-1347