Re: [tcpm] [v6ops] Flow Label Load Balancing

Alexander Azimov <a.e.azimov@gmail.com> Thu, 26 November 2020 20:36 UTC

Return-Path: <a.e.azimov@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47C533A0D64; Thu, 26 Nov 2020 12:36:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.098
X-Spam-Level:
X-Spam-Status: No, score=-0.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URI_DOTEDU=1.999] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 68bCO_P2Ao9g; Thu, 26 Nov 2020 12:36:45 -0800 (PST)
Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 643133A0B2D; Thu, 26 Nov 2020 12:36:45 -0800 (PST)
Received: by mail-ot1-x32e.google.com with SMTP id h19so2879095otr.1; Thu, 26 Nov 2020 12:36:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sWGdLHcAR8etsExYzs670N+p1KTnMTLkY8Z+fzFhauE=; b=sQA9zaqveiAc4OxYR98OuVpk1txhAnk3+zlVDj2mSxjmQYnCipnPGm0AqYI851Omt1 EfDjXNkSh4QqDkjPBqrRu1/cQUwbqfn/SaDJgWPMt4CtgzJ+4R0kff5Stu+UUpTKcsBA fvmC2uh7pEV2GAODULkbkXP65uZ8FaaTmnir86+V4oV+LBsvD6KMgF+BFPFkeMITjNkF 1/zEkHw32/XNs99dt0DAzKTMp9cmiGM4jGay9ErlToLQYN0VUmTEs+p94xJ8PLTZXYM+ zSgidtPAbEzEyZEwZO5XZFohQZCYkphJ5MPjkSffZvl0+q78pfToGoqIzsvEPwwOTjrA OxpA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sWGdLHcAR8etsExYzs670N+p1KTnMTLkY8Z+fzFhauE=; b=YgvqhuxZSmqLpTpvGeYjqNwk3MS0Jf0cctFuWw7KHwoAg52Gb6Vkqih6misnufUSgN je4iN0avqaAa6sGqKBZD6Ebzq5sq81EqO0Ry95OSbxAHXw2JgD6Dj7caU1aPn3SHDyVd OayxKcz05nmHnoRhw0B55FOOKdIhWhwBtZ1H8KXqkdFhyGwyE9X9JpgAM//HzuTqemf4 425z5y4wDFLRPflqss40JrV/gnSbxbRgSrw6Ga8FwPCZCMLmmRjS5VdgSw+qV7VssfBv Q+gyOrwzCS/NzWqkwSlTzQjK3hQ0z/ZrD8l7qmY+ASj1d8jE87L0GiYEtggaypnfi/CO IqeA==
X-Gm-Message-State: AOAM532wRoVW7X8+/UpVCvQt/y9aTLTLrRwgVrQW7thZRRXWUJuf7iwj W+dO4vd779h/dcplmJL5KqDP658r90NlfBUr/m0=
X-Google-Smtp-Source: ABdhPJworRl9dvv3KZByEiXhc2KOE102EhSQIm9Ij/0foxU77PRcrf1eX4x8RY3kN9BN+ucdqBmL6YLOf1ondo9xUI8=
X-Received: by 2002:a9d:4c92:: with SMTP id m18mr3590603otf.248.1606423004656; Thu, 26 Nov 2020 12:36:44 -0800 (PST)
MIME-Version: 1.0
References: <CAEGSd=DY8t8Skor+b6LSopzecoUUzUZhti9s0kdooLZGxPEt+w@mail.gmail.com> <d29042a7-742b-a445-cf60-2773e5515ae5@gont.com.ar> <CALx6S37+1duoNGR3dZWesHsZvx15kX9wCWufPMh=esvMaSMF_g@mail.gmail.com> <63e7aad3-7094-7492-dbe4-3eefb5236de3@gont.com.ar> <CALx6S37t4jump6S-R5_xdo5DF+RnHtT4rU5-RuiC-2GQ0PXxkQ@mail.gmail.com> <239c4b67-1d9a-da00-7bb0-52019be1b7c1@joelhalpern.com> <CALx6S34uSAne_LyhrWDcjkR5p7MO6ggm_Ua_h+6nkX41S=Ge=A@mail.gmail.com> <a8aad80c-1a4b-4a86-4c13-7391e8513049@joelhalpern.com> <CALx6S36xYADqNrPp1A_Ohx48d7SdV2oFOgVFVV+y_tDbGQG6ug@mail.gmail.com> <abf9c63a-2f7e-6f28-34e8-b3e9598cd2b9@gmail.com> <CALx6S36PTVT49CQHdJNx88PHyYQS23WYP3A7Xw1-+f_tt4H3Gg@mail.gmail.com> <CAEGSd=BGqFTygTiAt1v-71W3RTpdyVqyYzD1vi9uKebPPMoE5Q@mail.gmail.com> <a652123e-2371-3241-2884-425f3113344d@gmail.com>
In-Reply-To: <a652123e-2371-3241-2884-425f3113344d@gmail.com>
From: Alexander Azimov <a.e.azimov@gmail.com>
Date: Thu, 26 Nov 2020 23:36:33 +0300
Message-ID: <CAEGSd=Cer8TYMUtTQ=vTFtuSJ==brsJHY=UK37S3BzSg0z=P9w@mail.gmail.com>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>, tcpm <tcpm@ietf.org>, IPv6 Operations <v6ops@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000008d395f05b5088044"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/t0lRoZeslzQTT3JFyTrFBQzZHEU>
Subject: Re: [tcpm] [v6ops] Flow Label Load Balancing
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Nov 2020 20:36:49 -0000

Brian,

May I ask you to give more details on this issue:

I am convinced that it is often unsafe, because that depends on exactly how
> a server farm load balancer handles such cases. If it maintains the TCP
> session on the same actual server if there is SYN(-ACK) retransmission, the
> flow label must not be changed.

If a retransmitted SYN goes to another server/anycast point this won't
bring much harm. At least we haven't experienced such problems in our
datacenters.

чт, 26 нояб. 2020 г. в 22:53, Brian E Carpenter <brian.e.carpenter@gmail.com
>:

> Alexander,
>
> > 1. In the case of SYN or SYN-ACK retransmission flow label SHOULD be
> recalculated.
> ...
> > The first point stands for redirecting connection from the degraded path
> before the connection is established, and it looks safe.
>
> I am convinced that it is often unsafe, because that depends on exactly
> how a server farm load balancer handles such cases. If it maintains the TCP
> session on the same actual server if there is SYN(-ACK) retransmission, the
> flow label must not be changed.
> The benefit for possible broken routing is quite hypothetical, whereas the
> risk to
> session persistence at the server is real. (Remember that the source
> cannot know by magic that there is server load balancing at the far end,
> but it's the common case today.)
>
> > 2. In the case of RTO timeout expiration in the established TCP session
> the flow label MAY be recalculated. This setting MUST be switched off by
> default.
>
> As noted before, this is only safe in a limited domain where it is known
> with certainty that server load balancing is not in use. I think it's
> therefore a corner case and specifically, the appropriate normative
> statement would be SHOULD NOT, which is defined very carefully in RFC2119
> exactly for this sort of case.
>
> But in fact I think we should not attempt to legislate on this point. The
> value is marginal at best. As others have said, this is not the way to
> tackle operational routing problems.
>
> Regards
>    Brian Carpenter
>
> On 26-Nov-20 22:34, Alexander Azimov wrote:
> > Dear colleagues,
> >
> > We started discussing an incorrect default behavior, and Tom has already
> confirmed that it will be fixed.
> >
> > Later the thread turned into an argument if the flow label can be
> changed during connection/flow lifetime according to current RFC documents,
> though these documents can be updated. This looks a bit weird for me
> because I always thought that it is the IETF community's responsibility to
> document proper solutions. If something undocumented but worthy happens in
> the industry - IETF should catch up. So, I would like to get back to the
> discussion of reasons and their safety.
> >
> > The general idea of changing the routing path upon network
> outage/degradation looks obvious. Getting kind of source-based routing can
> significantly reduce the reaction time and improve end-user experience. The
> flow label has a perfect match here: transparent for the application, set
> by the source, not part of 5-tuple, while it can be used in the load
> balancing. IMO poorly documented. I would like to learn your feedback for
> the next wording:
> >
> > Let say that the flow label is a hash of values from the IP packet's
> 5-tuple and random number. Then
> >
> > 1. In the case of SYN or SYN-ACK retransmission flow label SHOULD be
> recalculated.
> > 2. In the case of RTO timeout expiration in the established TCP session
> the flow label MAY be recalculated. This setting MUST be switched off by
> default.
> > 3. Otherwise flow label SHOULD be preserved unchanged.
> >
> > The first point stands for redirecting connection from the degraded path
> before the connection is established, and it looks safe. The second one can
> also improve performance if it is set on the server-side. Please comment if
> you see security flaws in such a design.
> >
> > чт, 26 нояб. 2020 г. в 03:16, Tom Herbert <tom@herbertland.com <mailto:
> tom@herbertland.com>>:
> >
> >     On Wed, Nov 25, 2020 at 3:33 PM Brian E Carpenter
> >     <brian.e.carpenter@gmail.com <mailto:brian.e.carpenter@gmail.com>>
> wrote:
> >     >
> >     > I'm not Joel, but I did once spend some time grepping RFCs to find
> out whether
> >     > "flow" or "microflow" was the preferred term. In RFC2474, which is
> normative,
> >     > we have:
> >     >
> >     >    Microflow: a single instance of an application-to-application
> flow of
> >     >    packets which is identified by source address, destination
> address,
> >     >    protocol id, and source port, destination port (where
> applicable).
> >     >
> >     > But in the flow label work we explicitly avoided being that
> precise, and
> >     > did not use the term "microflow". There might be some load
> balancing
> >     > scenarios where you want a broader definition, even including
> bidirectional
> >     > flows. There are expired drafts on that topic:
> >     > draft-tarreau-extend-flow-label-balancing
> >     > draft-wang-6man-flow-label-reflection
> >     >
> >     Brian,
> >
> >     Random Packet Spraying
> >     (
> https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.297.529&rep=rep1&type=pdf
> )
> >     is an interesting idea where packets for a single connection are
> >     purposely distributed across multiple paths for load distribution.
> Per
> >     packet randomized flow labels with flow label aware ECMP makes this
> >     quite easy to do without requiring any special support in switches
> >     like you'd need with IPv4. I'm not necessarily advocating this, but
> it
> >     does highlight one potential use case of having a flow label that
> >     doesn;t have rigidly defined requirements on the host.
> >
> >     Tom
> >
> >     > Regards
> >     >    Brian
> >     >
> >     > On 26-Nov-20 11:13, Tom Herbert wrote:
> >     > > Joel, is there a normative definition of a flow?
> >     > >
> >     > > On Wed, Nov 25, 2020, 1:27 PM Joel M. Halpern <
> jmh@joelhalpern.com <mailto:jmh@joelhalpern.com> <mailto:
> jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>>> wrote:
> >     > >
> >     > >     No, as Brian says, there are escape clauses in the flow
> definitions.
> >     > >
> >     > >     But changing the flow label due to traffic problems does not
> correspond
> >     > >     to packets being in actually different flows.
> >     > >     If one were using UDP, and mixing loss sensitive packets
> with loss
> >     > >     insensitive packets for a special application, sure, one
> could use two
> >     > >     flow labels.  But that is not what you are describing.
> >     > >
> >     > >     Yours,
> >     > >     Joel
> >     > >
> >     > >     On 11/25/2020 3:05 PM, Tom Herbert wrote:
> >     > >     > Joel,
> >     > >     >
> >     > >     > Is there an RFC that clearly and unambiguously states that
> a host MUST
> >     > >     > use the same flow label for the lifetime _and_ clearly
> defines exactly
> >     > >     > what a flow is with respect to such a requirement (for
> instance, how
> >     > >     > would you define a flow and enforce such a requirement in
> UDP? IPsec?
> >     > >     > other encapsulations?). If there is such a requirement
> then we'll
> >     > >     > change the code to be conformant.
> >     > >     >
> >     > >     > Tom
> >     > >     >
> >     > >     > On Wed, Nov 25, 2020 at 12:39 PM Joel M. Halpern <
> jmh@joelhalpern.com <mailto:jmh@joelhalpern.com> <mailto:
> jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>>> wrote:
> >     > >     >>
> >     > >     >> This kind of thing is why, as I understand it, MPTCP has
> discovery
> >     > >     >> mechanisms ot know if both sides use it, and can select
> alternative
> >     > >     >> addresses for communication.
> >     > >     >>
> >     > >     >> Trying to guess flow labels that might avoid a problem
> because it might
> >     > >     >> be an ECMP problem, is just flailing about.  Not a good
> design for
> >     > >     >> operational protocols.
> >     > >     >>
> >     > >     >> And in general, designing protocols around "I know
> exactly what is going
> >     > >     >> on"  (the requirement for what you describe that goes
> well beyond just
> >     > >     >> "limited domains") is also a recipe for failure.
> >     > >     >>
> >     > >     >> The Flow Label RFCs are actually very explicit that a
> flow label is
> >     > >     >> supposed to be stable for the life of the flow.
> Otherwise, it isn't a
> >     > >     >> flow label.
> >     > >     >>
> >     > >     >> Yours,
> >     > >     >> Joel
> >     > >     >>
> >     > >     >> On 11/25/2020 2:35 PM, Tom Herbert wrote:
> >     > >     >>> Hi Fernando, comments in line...
> >     > >     >>>
> >     > >     >>> On Wed, Nov 25, 2020 at 12:13 AM Fernando Gont <
> fernando@gont.com.ar <mailto:fernando@gont.com.ar> <mailto:
> fernando@gont.com.ar <mailto:fernando@gont.com.ar>>> wrote:
> >     > >     >>>>
> >     > >     >>>> Hi, Tom,
> >     > >     >>>>
> >     > >     >>>> On 24/11/20 16:43, Tom Herbert wrote:
> >     > >     >>>> [....]
> >     > >     >>>>> Modulating the flow label is a means to affect the
> routing of packets
> >     > >     >>>>> through the network that uses flow labels as input to
> the ECMP hash.
> >     > >     >>>>
> >     > >     >>>> What's the point?
> >     > >     >>>>
> >     > >     >>>> 1) You cannot tell *if* the FL is being used.
> >     > >     >>>>
> >     > >     >>> Generally true, but in a limited domain this information
> could be
> >     > >     >>> discerned. I'd note that it's also generally true that
> we don't know
> >     > >     >>> if there is a load balancer or stateful firewall in the
> path that
> >     > >     >>> requires consistent routing, but in a limited domain we
> could know
> >     > >     >>> that also.
> >     > >     >>>
> >     > >     >>>> 2) Changing the FL does not necessarily mean that
> packets will employ a
> >     > >     >>>> different link.
> >     > >     >>>
> >     > >     >>> It's an opportunistic mechanism. If a connection is
> failing and we get
> >     > >     >>> a better path that fixes it by simply changing the flow
> label then
> >     > >     >>> what's the harm?
> >     > >     >>>
> >     > >     >>>>
> >     > >     >>>> 3) If the network is failing, shouldn't you handle this
> via routing?
> >     > >     >>>>
> >     > >     >>> Sure, but then that requires an out of band feedback
> loop from a TCP
> >     > >     >>> implementation to the network infrastructure to indicate
> there is a
> >     > >     >>> problem and then the network needs to respond. That's
> significant
> >     > >     >>> infrastructure and higher reaction time than doing
> something in TCP
> >     > >     >>> and IP. Think of modulating the flow label is an
> inexpensive form of
> >     > >     >>> source routing within a limited domain that doesn't need
> any
> >     > >     >>> infrastructure or heavyweight protocols or something
> like segment
> >     > >     >>> routing.
> >     > >     >>>
> >     > >     >>>>
> >     > >     >>>>
> >     > >     >>>>> The basic idea is that the flow label associated with
> a connection is
> >     > >     >>>>> randomly changed when the stack observes that the
> connection is
> >     > >     >>>>> failing (e.g. and an RTO). There is nothing in the
> specs that prevents
> >     > >     >>>>> this since the source is at liberty to set the flow
> label as it sees
> >     > >     >>>>> fit.
> >     > >     >>>>
> >     > >     >>>> The FL is expected to remain constant for the life of a
> flow. A
> >     > >     >>>> retransmitted packet is part of the same flow as the
> >     > >     >>>> originally-transmitted packet. So this seems to be
> contradicting the
> >     > >     >>>> very specification of the FL.
> >     > >     >>>>
> >     > >     >>>> For instance, If a RTO for a flow causes the FL to
> change, then one may
> >     > >     >>>> possibly argue that the FL is not naming/labeling what
> is said/expected
> >     > >     >>>> to be anming/labeling.
> >     > >     >>>
> >     > >     >>> Specifically, RFC6437 states:
> >     > >     >>>
> >     > >     >>> "It is therefore RECOMMENDED that source hosts support
> the flow label
> >     > >     >>> by setting the flow label field for all packets of a
> given flow to the
> >     > >     >>> same value chosen from an approximation to a discrete
> uniform
> >     > >     >>> distribution."
> >     > >     >>>
> >     > >     >>> So that is clearly a just recommendation, and not a
> requirement (and
> >     > >     >>> definitely not a MUST). Furthermore, RFC6437 states:
> >     > >     >>>
> >     > >     >>> "A forwarding node MUST either leave a non-zero flow
> label value
> >     > >     >>> unchanged or change it only for compelling operational
> security
> >     > >     >>> reasons as described in Section 6.1."
> >     > >     >>>
> >     > >     >>> So there's no guarantee in the protocol specs that flow
> labels are
> >     > >     >>> consistent for the life of the connection, which means
> that the
> >     > >     >>> network cannot assume that and thus it would be
> incorrect if the
> >     > >     >>> network tried to enforce flow label consistency as a
> protocol
> >     > >     >>> requirement. As I said, it is prudent to try to be
> consistent with
> >     > >     >>> flow labels and the default behavior in Linux should be
> changed,
> >     > >     >>> however I do not believe there's a valid claim of
> non-conformance that
> >     > >     >>> motivates removal of the feature that is already
> deployed.
> >     > >     >>>
> >     > >     >>> Tom
> >     > >     >>>
> >     > >     >>>
> >     > >     >>>
> >     > >     >>>
> >     > >     >>>>
> >     > >     >>>>
> >     > >     >>>>
> >     > >     >>>>> The feature is useful in large datacenter networks,
> like
> >     > >     >>>>> pparently Facebook where the patches originate, since
> information
> >     > >     >>>>> discerned by TCP can opportunistically be applied to
> route selection.
> >     > >     >>>>> The practical issue is that there are stateful devices
> like firewalls
> >     > >     >>>>> that require consistent routing in the network in
> which case changing
> >     > >     >>>>> the flow label can confuse them. As I mentioned, the
> original intent
> >     > >     >>>>> was that the flow label randomization feature should
> be opt-in instead
> >     > >     >>>>> of on by default.
> >     > >     >>>>
> >     > >     >>>> So... where is the "source" of the packet that would be
> "modulating" the FL?
> >     > >     >>>>
> >     > >     >>>> Thanks,
> >     > >     >>>> --
> >     > >     >>>> Fernando Gont
> >     > >     >>>> e-mail: fernando@gont.com.ar <mailto:
> fernando@gont.com.ar> <mailto:fernando@gont.com.ar <mailto:
> fernando@gont.com.ar>> || fgont@si6networks.com <mailto:
> fgont@si6networks.com> <mailto:fgont@si6networks.com <mailto:
> fgont@si6networks.com>>
> >     > >     >>>> PGP Fingerprint: 7809 84F5 322E 45C7 F1C9 3945 96EE
> A9EF D076 FFF1
> >     > >     >>>>
> >     > >     >>>>
> >     > >     >>>>
> >     > >     >>>
> >     > >     >>> _______________________________________________
> >     > >     >>> v6ops mailing list
> >     > >     >>> v6ops@ietf.org <mailto:v6ops@ietf.org> <mailto:
> v6ops@ietf.org <mailto:v6ops@ietf.org>>
> >     > >     >>> https://www.ietf.org/mailman/listinfo/v6ops
> >     > >     >>>
> >     > >
> >     > >
> >     > > _______________________________________________
> >     > > v6ops mailing list
> >     > > v6ops@ietf.org <mailto:v6ops@ietf.org>
> >     > > https://www.ietf.org/mailman/listinfo/v6ops
> >     > >
> >     >
> >
> >     _______________________________________________
> >     v6ops mailing list
> >     v6ops@ietf.org <mailto:v6ops@ietf.org>
> >     https://www.ietf.org/mailman/listinfo/v6ops
> >
> >
> >
> > --
> > Best regards,
> > Alexander Azimov
>
>

-- 
Best regards,
Alexander Azimov