Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?

Gyan Mishra <hayabusagsm@gmail.com> Tue, 10 August 2021 16:29 UTC

Return-Path: <hayabusagsm@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D72673A131E; Tue, 10 Aug 2021 09:29:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.086
X-Spam-Level:
X-Spam-Status: No, score=-2.086 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_FONT_LOW_CONTRAST=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XA3lKBGAawur; Tue, 10 Aug 2021 09:29:32 -0700 (PDT)
Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A05D13A1360; Tue, 10 Aug 2021 09:29:32 -0700 (PDT)
Received: by mail-pl1-x629.google.com with SMTP id b7so2050378plh.7; Tue, 10 Aug 2021 09:29:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=dUSSZ2FczGTLnwrjYRh/C2Bje5wDQwpf5Loso5IOsdI=; b=FgJlbdKOk8R/jMT4YE+3MvVdY8m26B+zZFDF+5U0UlthFnN7UZ6uDMnpysQZIkru8a FiVyu0QZUdA0Kz07B9xgc5FvfqIaa+IHxjQ2y2afP6ukbX7HwG5WXZthdEngXjGb9ius J7txoDTupq9AZoG6GbVUadYnAHvkgJ2W1IoHo7ywG2X9hTvjYFQljhtl9RdSYXcCZbJg uy+I80zXLhSVG455N4ir2x6MmjGTJKPusb+uoJpvngNgwlwvzwogvsKbaJfo4+UcOkQb VOV+LIN2pRNaESlsVgEVzMwMxAUTSQ90d3ZDJvH5Ty1foY7vyvSOUJdT3lmUwkANdV+s HU6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=dUSSZ2FczGTLnwrjYRh/C2Bje5wDQwpf5Loso5IOsdI=; b=jwaTRHeL8ikcKYSKXJ76CelmZMZJOfDuVEyyqJ8Lg5gkXXzb94E0JfZN5Xb0HA3495 YCjg6iCeJ+iVt0yPXzf/FmgklFvmxnLXqroxVe4GBGNWoj+7INMxWoJIaD1DIoXV1CCD Hki3kSnsjMBx37+trchqWINrmigvLuq60czkUUZ1ySTrmGnt+u37q1G8kIYTj8p12aHX 04NiO7FaqN3JlRonI0yQ4hq3SUi5LCe6yMtMsAsD5GPznI9/CTOryl+Gv30Qogvhoi60 9wnv778kv7zWkzKhx9wXIoUJHE8s6ljC+v3+IZrx/0CXsDyvF78xCvz5s9rLN7N2irT0 u/tg==
X-Gm-Message-State: AOAM531vDiM338FtUt9/ESY6035zVp2LF4J7YpKJNYphOKgZkWQueyo2 vt4BcE7yp2pU1SsDxTTNgrmvNshr0lSzRXYxJJs=
X-Google-Smtp-Source: ABdhPJzTc0drI7CL2qVUETqF8Tq+LjzXfYAg5+F2z5tYAUMldTngxGPfm924tT/0oD9LFQo59tv3IAqW6hAXAMxItJI=
X-Received: by 2002:a17:90a:930e:: with SMTP id p14mr32087095pjo.132.1628612970832; Tue, 10 Aug 2021 09:29:30 -0700 (PDT)
MIME-Version: 1.0
References: <CALZ3u+aP=v_1=w1xqfEKof7Cc6Ba3pwOYV3O=0b=NxS4hRWhiA@mail.gmail.com> <YRBdZrKV+MrrhUCG@mit.edu> <CALZ3u+aBdE3Bw3_ry+CuV4tS016c4mWewJFpr0aCbBnwj70Vzg@mail.gmail.com> <a3833e04-c123-ef52-95f9-cae80a1390e7@foobar.org> <CAMm+LwiAbiK618+kY9JTLr7_mQd-E5TKyNsGqOLrGQoLzjJo=A@mail.gmail.com> <CALZ3u+bLVUZf1fTHQvAVzOnToiPcsXEyTNt56hNAXz4=-G5-6w@mail.gmail.com> <CAHw9_i+k9x1g3bcst6rHcXpesEVwnPtV6DzsFAxi8dC6CRMZPw@mail.gmail.com> <CALx6S346mqNaE+s1DH7S7RutTpzfrC5oX1No5Jb72sTvVQjtpQ@mail.gmail.com> <CAHw9_i+ELJS_xqcEHM4raq+f=PZ5yw1ptfG3a6VypZmWTo11-A@mail.gmail.com> <CAOj+MMGzWq1OrwBQW_Mz4gB+z9wJSdQnFCkTmWiHi_Tm3ty47g@mail.gmail.com> <YRHx4c8/nOh5aXN1@mit.edu> <CABNhwV1HdSrzHDLhuSMaWY+9UaHnFYaYo75fN3+JMgMnf+Pnhw@mail.gmail.com> <CALZ3u+Z4XYf0gLrhsA5D1pJz5O2Wn6fpBugh6LeTOkGb9Pn=7A@mail.gmail.com> <CABNhwV3rAjueGD_vKgoTc7egF9RzDXbswTibOZYb50da3H8Ljw@mail.gmail.com> <301d978d8c27427f954af79070fe5741@huawei.com> <CABNhwV3_X5OC1p-191r8scCX3yWDv_H0xGVDh0sUQfSzLWonUA@mail.gmail.com> <16f0deadd95a4b84b422ecdb87864c9b@huawei.com>
In-Reply-To: <16f0deadd95a4b84b422ecdb87864c9b@huawei.com>
From: Gyan Mishra <hayabusagsm@gmail.com>
Date: Tue, 10 Aug 2021 12:29:19 -0400
Message-ID: <CABNhwV08wzZrsOnGq+F7ctL4+1uoy1mk7QXM_K4T5St+Jo46GA@mail.gmail.com>
Subject: Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who cares?
To: Vasilenko Eduard <vasilenko.eduard@huawei.com>
Cc: 6man WG <ipv6@ietf.org>, IETF discussion list <ietf@ietf.org>, Phillip Hallam-Baker <phill@hallambaker.com>, Theodore Ts'o <tytso@mit.edu>, Töma Gavrichenkov <ximaera@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000009a64a305c937016f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/H4DvKMzDbpEF39-Ah3oevmZobuQ>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Aug 2021 16:29:39 -0000

Hi Eduard

On Tue, Aug 10, 2021 at 9:24 AM Vasilenko Eduard <
vasilenko.eduard@huawei.com> wrote:

> Hi Gyan,
>
> I am not sure why Linux RTO refreshed the flow label initially in
> 2014-2016.
>

   Gyan> I believe the kernel update / hack was related to rehash of load
balancing to pick the next best proximity BGP Anycast path after first
RTO.  Since it’s load balancing hash related that’s where RFC 6347 flow
label applicability to server load balancing RFC 7098 comes into play.
Which presentation was it on 111.

> But the presentation on the last IETF that attracted my attention was
> about how to mitigate this type of vendor's bug: loss of sync between
> control plane and data plane. That leads to prolonging “silent drop”.
>
> Gyan> Unfortunately I may have stepped away missed that discussion.  So
> was the Linux kernel rehash after first RTO mentioned as a server fix.  A
> much better fix as you and others mentioned is BFD tight timers to detect 1
> way fiber in DC and iOAM.
>
> I have stated in the 1st message: it is a problem that this work-around
> has become the default for the whole Internet.
>
> I agree that it should not happen by default after any number of RTOs.
>
> Gyan> Agreed
>
> But even after activation, it makes sense to give IGP a chance to repair
> the problem.
>
> IMHO: RTO timer should be configurable or at least 1s to give OSPF a
> chance.
>
   Gyan> default RTO is I believe 20 seconds.  I agree  it would be nice if
the RTO timer was configurable.

> Because rerouting would happen even if IGP would fix the problem very
> soon, not only for “hung” PFE.
>
> Eduard
>
> *From:* Gyan Mishra [mailto:hayabusagsm@gmail.com]
>
> *Sent:* Tuesday, August 10, 2021 4:09 PM
> *To:* Vasilenko Eduard <vasilenko.eduard@huawei.com>
> *Cc:* 6man WG <ipv6@ietf.org>; IETF discussion list <ietf@ietf.org>;
> Phillip Hallam-Baker <phill@hallambaker.com>; Theodore Ts'o <tytso@mit.edu>;
> Töma Gavrichenkov <ximaera@gmail.com>
> *Subject:* Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who
> cares?
>
>
>
>
>
> Hi Eduard
>
>
>
> On Tue, Aug 10, 2021 at 8:34 AM Vasilenko Eduard <
> vasilenko.eduard@huawei.com> wrote:
>
> It is probably too strong: “let’s kill the Linux hack all together”.
>
> IMHO: it should be completely switched off by default. But if some admin
> would like to use it – let it activate it and use it.
>
>
>
>     Gyan> Agreed
>
> Cross all OSI model optimization looks not the good architecture decision,
> but if somebody wants to do it – why not.
>
>  Gyan> Agreed
>
> The original problem that was raised for this Linux feature (original use
> case):
> many vendors already have so bad microcode that it is very often in the
> big DC environment to have broken PFE that the control plane is not aware
> of. “Silent drop” up to manual intervention.
>
> Of course, it is better to monitor such a situation in a different way
> (iOAM, BFD), but if one has already hundreds or thousands of switches – it
> is not a short-term proposition. Faster work-around is needed.
>
>
>
>      Gyan> I believe the  original problem with hashing reported not in DC
> environment but over the internet?  For the general internet scenario  not
> rehashing at all as the Default behavior is the best solution.  For the DC
> and other scenarios Linux developers can change as they see fit for their
> environment is fine.
>
>
>
>
>
> Ed/
>
> *From:* ietf [mailto:ietf-bounces@ietf.org] *On Behalf Of *Gyan Mishra
> *Sent:* Tuesday, August 10, 2021 3:10 PM
> *To:* Töma Gavrichenkov <ximaera@gmail.com>
> *Cc:* Theodore Ts'o <tytso@mit.edu>; Phillip Hallam-Baker <
> phill@hallambaker.com>; 6man WG <ipv6@ietf.org>; IETF discussion list <
> ietf@ietf.org>
> *Subject:* Re: IPv6 Anycast has been killed by LINUX patch in 2016 - who
> cares?
>
>
>
> Hi Töma
>
>
>
> On Tue, Aug 10, 2021 at 5:55 AM Töma Gavrichenkov <ximaera@gmail.com>
> wrote:
>
> Peace,
>
> On Tue, Aug 10, 2021, 9:31 AM Gyan Mishra <hayabusagsm@gmail.com> wrote:
>
> a patch that makes default less aggressive by restoring the original
> default behavior to recompute hash only after multiple RTOs.
>
>
>
> Let's now talk about hacks, right?
>
>
>
> A flow is basically a stream of similar data within one or more
> connections.  This is an application layer concept.  Architecturally, it
> may change on a connection if the data flow within the connection changes.
>
>
>
>    Gyan> Agreed
>
>
>
> E.g. we've established a connection to [youtube DNS A entry]:443,
> downloaded the hypertext, but now we're going to reuse the same established
> connection to stream video, so the network should better treat that
> connection somehow differently now.
>
>
>
> The flow label was never supposed to be a legitimate control over
> routing.  It shouldn't change over one, two, or a hundred RTOs.  It
> generally only changes when the flow becomes different.
>
> I believe this was so obvious to the authors of the original specification
> in 2003 that they even forgot to actually state it.
>
>
>
>     Gyan> Very Good point. So let’s say you have an IPv4 or IPv6 TCP
> Anycast connection you should stay on that proximity routed flow throughout
> the duration that goes for the long lived TCP.  But now with the Linux hack
> we now shift after the first RTO immediately to try a different BGP anycast
> path via Linux hack patch and hope for better results in case the first
> path was congested or having issues.  This is definitely an application
> based network engineering hack by a Linux developer whom had the best
> intentions of a application network awareness  self healing network.  From
> a technical standpoint as a TCP RST has already been receiving and we are
> re-establishing the connection, I am not understanding why this was such a
> bad thing understandable that it’s aggressive but the thought process does
> makes sense.  The Linux developers thought was that if you got an RTO, then
> more then likely that network path is bad and let’s rehash to a different
> path immediately.  I can see the down side is that first Anycast path from
> a BGP path selection was the best lowest latency path, but now the
> application thinks it understands the network better then network engineers
> and thinks it’s better to rehash to a different path immediately.  The
> MAJOR problem with that is as BGP Anycast is proximity based you could end
> up going half was around the world for the second best path and now voila
> —> TCP Anycast is now from the Happy Eyeballs (not the RFC 6555) but user
> perspective is completely broken thus  the subject heading “IPv6 Anycast
> has been killed by Linux patch”.
>
>
>
> What Tom proposed is, of course, way better than how it works now.
> Especially the socket option — yay, Linux is finally going to implement the
> "MUST" in RFC3697#3!  We harbour the hope that other operating systems
> would do the same good thing.
>
>
>
> Gyan> Given what I stated above I would say let the Network do the
> networking and as CDN makes up 90% plus of the internet traffic being GEO
> load balanced worldwide, and as we have IETF ALTO WG that does application
> based traffic optimization BGP-LS / PCEP CDN  RSVP / SR aware network
> optimization based solutions  that  already exist today,  let’s kill the
> Linux hack all together.  As the Linux server is completely unaware of
> network conditions any rehash is bad thing as that breaks TCP Anycast by
> sending you clear around the world when you should be “sticky” based on BGP
> Anycast best path selection stay on the optimization proximity based
> network path and only shift to alternate BGP path when the path is no
> longer available.  Let routing do it’s routing!!
>
>
>
> But the idea I'm trying to drive home is: fixing (temporary) network
> delivery issues via the control of a strictly application level feature is
> among the dirtiest of the hacks possible.
>
>
>
>   Gyan> Firmly agreed
>
>
>
> And it kind of amazes me how people call anycast a hack (while it's
> perfectly the behaviour natural to the Internet, a global self-healing
> internetwork, as designed in 1970s) and still consider this a legitimate
> behaviour.
>
>
>
> Gyan> Agreed
>
>
>
> Gyan> After reading the  feedback from Toma can we not rehash at all for
> the Default Linux patch.  See the MAJOR problem that is being created when
> you try to rehash with BGP Anycast described above and that basically any
> rehash literally breaks IPv6 flow label based TCP Anycast CDN load
> balancing.
>
>
>
> --
>
> Töma
>
> --
>
> <http://www.verizon.com/>
>
> *Gyan Mishra*
>
> *Network Solutions Architect *
>
> *Email gyan.s.mishra@verizon.com <gyan.s.mishra@verizon.com>*
>
> *M 301 502-1347*
>
>
>
> --
>
> <http://www.verizon.com/>
>
> *Gyan Mishra*
>
> *Network Solutions Architect *
>
> *Email gyan.s.mishra@verizon.com <gyan.s.mishra@verizon.com>*
>
> *M 301 502-1347*
>
>
>
-- 

<http://www.verizon.com/>

*Gyan Mishra*

*Network Solutions A**rchitect *

*Email gyan.s.mishra@verizon.com <gyan.s.mishra@verizon.com>*



*M 301 502-1347*