Re: ECMP and flow label [Re: 6man w.g. last call for <draft-ietf-6man-segment-routing-header-11.txt>]

Brian E Carpenter <brian.e.carpenter@gmail.com> Sat, 31 March 2018 00:45 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: ipv6@ietfa.amsl.com
Delivered-To: ipv6@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B6D91273E2 for <ipv6@ietfa.amsl.com>; Fri, 30 Mar 2018 17:45:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cerP44NbfkBg for <ipv6@ietfa.amsl.com>; Fri, 30 Mar 2018 17:44:59 -0700 (PDT)
Received: from mail-pg0-x22d.google.com (mail-pg0-x22d.google.com [IPv6:2607:f8b0:400e:c05::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0748D1205F0 for <ipv6@ietf.org>; Fri, 30 Mar 2018 17:44:59 -0700 (PDT)
Received: by mail-pg0-x22d.google.com with SMTP id a15so5925522pgn.5 for <ipv6@ietf.org>; Fri, 30 Mar 2018 17:44:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=6Iy7yrV41c+cYTfIFC+ygQmod+wnLvV+K8YWcgYWtVk=; b=CKPg8XLMTitFOv/79ymT+zqcAbAFZ7Bl6t22m4iFokgMsEMGKqKx50wWk+dfvyWXs6 4e77XVJnv2c8BaU0ll+4Co7Xoy+YiLHlsj/2xzF3fihNa43ZwNmD5MoJ1JUU15V+kjd5 OyHQtgu6FYQoZcHmq3dWVH3KbyXl2NdhIAeC0mgcfepjtM3NbGt5WosX2QGfvletNvMe Cusl3rMQC028tGYUnpbiTEq6aMc70UAlol49CmgHM+D2jH/WU3d9vunmos599Ze1sbLy 8DlOkdJZ0pDY95ygR0d3FFqZGfC/fu5Tnq28UvUR+k8XRvCXGjgC7YyhFdGIosr/faqI SVmA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:to:cc:references:from:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=6Iy7yrV41c+cYTfIFC+ygQmod+wnLvV+K8YWcgYWtVk=; b=iB9TG1TDlFyyHniXUS7AJHN3CrM+Nts47zVlRUvO29p0+qmQ9OnQwj91rF/Fq75bnI 3EDgAB4ez5izIyt71KY6BQ7dCzbKRlvwu6mwyuz7j6qGnCT7JHHIFgyaQ27+x4PFvFTI E14scEXBenBdE90xFM8c/YmqsfwArTbEWbZPHgbSMMczDWWSYvq0ablXfGRahTByUM0M U4fcTL08g1DePvCuaZz2VACK+9FyTkwpyPp2DJwVtb8TpRM6g+q9eKHyrkKF9AfxOUO/ VlP/eQrXO5UIyZVRdp3nc3kCAkUDZ5G1ApB2Ul9ic4AXaBYz1sXCGl/1o7+IWm5pAIPL Yyvw==
X-Gm-Message-State: AElRT7FT2u3G4qtj4pwPawplC9Am64HKOCfVEY2zi4YADGqgV1zV3ovf PCVI9HnDiS+bOtWzvJPL7Q7jFw==
X-Google-Smtp-Source: AIpwx4+7RcziT52RITLwEnRbnSv9nueQrSLY5qCA7G14ys++qgdCLRPCP9VGqw34rNMtd2asaGhGNQ==
X-Received: by 10.101.77.13 with SMTP id i13mr698451pgt.70.1522457098164; Fri, 30 Mar 2018 17:44:58 -0700 (PDT)
Received: from [192.168.178.26] ([118.148.77.103]) by smtp.gmail.com with ESMTPSA id n27sm15940426pgc.40.2018.03.30.17.44.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Mar 2018 17:44:57 -0700 (PDT)
Sender: Brian Carpenter <becarpenter46@gmail.com>
Subject: Re: ECMP and flow label [Re: 6man w.g. last call for <draft-ietf-6man-segment-routing-header-11.txt>]
To: Tom Herbert <tom@herbertland.com>
Cc: 6man <ipv6@ietf.org>
References: <20160428004904.25189.43047.idtracker@ietfa.amsl.com> <FB1C6E49-81F7-49DD-8E8B-2C0C4735071B@gmail.com> <523d27a3-285e-6bcf-2b07-2cd8d31b0915@joelhalpern.com> <CALx6S34s-XjgwYaiqFJVvPwmmUYD4qPaCv-Ku+h6q1A9aLz9hA@mail.gmail.com> <2a1a2cbe-772b-d13e-67ba-71106d40d61b@joelhalpern.com> <98bb6524-fc47-74f8-4b51-76581e43b825@joelhalpern.com> <3f900fd5-60aa-98e9-a357-f5ece5db1891@gmail.com> <CALx6S34yJsKZ8ocRWSe7iU5iwbO6OBCQFrZ1BGCiKw_=GOUxaQ@mail.gmail.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Message-ID: <2e641ad6-8373-a76a-5e9c-45758cdad4b3@gmail.com>
Date: Sat, 31 Mar 2018 13:45:01 +1300
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <CALx6S34yJsKZ8ocRWSe7iU5iwbO6OBCQFrZ1BGCiKw_=GOUxaQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/ipv6/SCuNayBbrje6LBTpFbf7qUJi-vw>
X-BeenThere: ipv6@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "IPv6 Maintenance Working Group \(6man\)" <ipv6.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ipv6>, <mailto:ipv6-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ipv6/>
List-Post: <mailto:ipv6@ietf.org>
List-Help: <mailto:ipv6-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ipv6>, <mailto:ipv6-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 31 Mar 2018 00:45:01 -0000

Hi Tom,
On 31/03/2018 13:10, Tom Herbert wrote:
> On Fri, Mar 30, 2018 at 3:50 PM, Brian E Carpenter
> <brian.e.carpenter@gmail.com> wrote:
>> On 31/03/2018 05:38, Joel M. Halpern wrote:
>>> I responded a bit too quickly to Tom's note.
>>> I agree with the later parts of his note.  It is the ECMP behavior of
>>> intermediate routers that cause me concern.
>>> Whether his proposed fix is the right one or not is unclear, as the blog
>>> post I referred to suggested more issues.  (And I apologize for being
>>> unable to find a pointer to the blog post.)
>>
>> OK, but if this is a real issue with our RFCs, and not an implementation
>> defect, we need to fix it,independently of the current draft.
>>
>> More below...
>>
>>> Yours,
>>> Joel
>>>
>>> On 3/30/18 12:29 PM, Joel M. Halpern wrote:
>>>> Sorry for being a bit vague.  The issue with packet reordering when
>>>> using flowlabel for ECMP was discussed in a blog post from a senior
>>>> engineer (at a company other than my own) who as far as I know does not
>>>> participate in the IETF.  He went through a number of issues in that.
>>>>
>>>> There are likely ways to work around the problems.  But since the draft
>>>> does not even refer to the issue, there is a serious gap.
>>>>
>>>> Yours,
>>>> Joel
>>>>
>>>> On 3/30/18 11:59 AM, Tom Herbert wrote:
>>>>> On Fri, Mar 30, 2018 at 7:51 AM, Joel M. Halpern <jmh@joelhalpern.com>
>>>>> wrote:
>>>>>> I do not think this document is ready to be sent to the IETF and IESG
>>>>>> for
>>>>>> final approval.
>>>>>> There are several kinds of problems.
>>>>>>
>>>>>> ECMP1: The document asserts that entropy information is put into the
>>>>>> flow
>>>>>> label.   I wish this worked.  I helped bring this idea forward years
>>>>>> ago,
>>>>>> independent of SRH.  Unfortunately, there are two related problems.
>>>>>> First,
>>>>>> adoption appears to be low.  Second, and more important, there are
>>>>>> reports
>>>>>> from the field that doing thi actually breaks other things and cause
>>>>>> packet
>>>>>> re-ordering under some circumstances.  As a result, vendors are
>>>>>> suggesting
>>>>>> that people turn it off even when it is available.
>>>>>
>>>>> Joel,
>>>>>
>>>>> I assume you're referring to:
>>>>>
>>>>> "If an array of layer-3 adjacencies is bound to the End.X SID, then
>>>>> one entry of the array is selected based on a hash of the packet's
>>>>> header (at least SA, DA, Flow Label)."
>>>>>
>>>>> The problem with flow label isn't that it can cause OOO packets, it's
>>>>> interaction with stateful techniques in intermediate nodes like
>>>>> firewalls and load balancer that require all packets for a flow to
>>>>> consistently hit the same stateful device. When the flow label changes
>>>>> in mid flow, packets might be routed to a different device that
>>>>> doesn't have the flow state and hence are dropped.
>>>>>
>>>>> I imagine that one could argue that SRH is intended to be used in
>>>>> closed domains so the operator could ensure there are no stateful
>>>>> nodes present that would have an issue, but that makes correct
>>>>> operation of the protocol conditional.
>>
>> If it's conditional on "source nodes MUST conform to [RFC6437]",
>> would that be a problem?
>>
> Hi Brian,
> 
> Maybe I'm missing it, but I don't think that RFC6437 plainly says that
> the flow label MUST be persistent for the lifetime of the flow.

The tricky bit is defining "flow" so essentially we fixed on the idea
that a flow is whatever the source host decides is a flow. By default,
it's a given transport session. But imagine, however, that you open
multiple simultaneous TCP sessions to the same server and want them
all to be load-balanced together. (This isn't hard to imagine.)
Then maybe a better heuristic would be to give them all the same flow
label.

So yes, there is wiggle room in the RFC. But if you do what's suggested
as a default, a stateless hash of the 5-tuple, you get a given flow label
for each entire TCP connection.

> There's actually some interesting things that can be done by changing
> the flow label, for instance we did this in Linux for a failing
> connection to reroute the flow through the network. It worked quite
> well, but inevitably someone will hit a device that expects the flow
> label to be consistent. We ended up turning the turn feature off. The
> default behavior in Linux now is that the flow label is consistent for
> the life of TCP connections.
> 
>>>>> IMO the fix for the flow label/ECMP problem needs to include an update
>>>>> to RFC6437 specifying 1) the flow label can only be set by the source,
>>>>> nodes in the path MUST NOT set it,
>>
>> We debated this at great length years ago, and it was explicitly changed
>> to the rules in RFC6437, because the previous rule was unrealistic. But
>> it defines clear rules about what forwarding nodes may do.
>>
> Well, sort of clear at best.
> 
> A forwarding node may set a zero flow label to something non-zero so
> that is a changing flow label. Also, a flow may be routed through
> different devices that could set the flow label, however there's no
> reason to believe they would set the label to the same value.

Ah, right, that could happen. Sounds like a Worst Current Practice.
 
> Section 6.1 has always puzzled to me. I'm not sure why the flow label
> is particularly cited as a dangerous covert channel, I would think
> something like the potential slack space in UDP would have been a much
> greater concern. In any case, the conditions that the flow label could
> be changed for security are not normatively described so I imagine
> someone might assume some leeway.

Yes. We were certainly under the impression that some firewalls at the
time simply cleared the flow label to be on the safe side.

> 
>>>>> 2) if the flow label is set, it
>>>>> MUST be an entropy identifier of the encapsulated transport flow so
>>>>> that (SA, DA, flow label) is a good tuple representation of the flow,
>>
>> Wait, "encapsulated"? The flow label refers to the header it occurs in,
>> not to an encapsulated header, which it knows nothing about.
>>
> "Encapsulated" being used in it's broadest meaning. If there is a
> transport flow in the packet, the flow label respresents that and the
> 3-tuple is useful for ECMP and such.

OK.

>> Ignoring this, that is the intention of RFC6437, apart from not misusing the
>> word 'entropy'. Please point to any text in RFC6437 that isn't clear enough.
>>
> I think the description is clear, however I'm not sure if the
> algorithm in Appendix A is so relevant any more.

It's not. With all due respect to Von Neumann,it's not even a very
good algorithm. But it's non-normative. 

> There are now many
> examples of hashes created from packet headers from which the flow
> label can be derived (we use Jenkin's hash in Linux, most  NICs
> implement Toeplitz, etc.). An even simpler method in TCP is to just
> create a random number for each connection that is stored in the
> connection context from which the flow label is derived.

Sure, if you don't mind adding 20 bits of state.

> 
>>>>> 3) the flow label MUST be the same for all packets sent on a flow
>>>>> including fragments.
>>
>> Yes, there is a problem for fragmented packets if the flow label is set by
>> a forwarding node, as mentioned at https://tools.ietf.org/html/rfc6437#page-7
>> That isn't a problem if the source node sets the flow label, which is of
>> course the primary recommendation. (Therefore, it isn't a problem
>> in the RFC 6438 scenario, ECMP in tunnels.)
>>
>> Should we be adding a BCP to that, requiring this choice:
>>
>>       *  A forwarding node might use the 2-tuple to define a flow in all
>>          cases.  In this case, subsequent load distribution would be
>>          based only on IP addresses.
> 
> Then that doesn't give us flow specific ECMP which is useful for
> tunnels. Using ECMP for flows is well deoployed. The opportunity with
> the flow labels was to give the same functionality without routers
> needing to resort to DPI to find transport headers for the hash.

Sure, absent fragmentation. But with fragmentation, we will always have
problems (and DPI will fail).

If the flow label is zero and a fragment header is present, what else
can you do but use the 2-tuple?

    Brian

> 
> Tom
> 
>> ?
>>     Brian
>>
>>>>> Tom
>>>>>
>>>>>> ECMP2: If, with suitable analysis, we decide that this is actually
>>>>>> safe to
>>>>>> do, then the document needs to include both that analysis and clearly
>>>>>> spelled out requirements that operators MUST verify that all their
>>>>>> routers
>>>>>> support this behavior and that they have enabled this behavior before
>>>>>> turning on SRv6 usage.
>>>>>>
>>>>>> Structure: The text in section 2.3 says "It is assumed in this
>>>>>> document that
>>>>>> the SRH is added to the packet by its source".  This is at best
>>>>>> disingenuous.  It is very clear that the value of this behavior lies
>>>>>> primarily in use by the network, not by the packet sources.  Claiming
>>>>>> otherwise results in a document with minimal utility.  Even this
>>>>>> document
>>>>>> itself disagrees with this assertion.  Tucked into the very next
>>>>>> section is
>>>>>> text saying that "outer header with an SRH applied to the incoming
>>>>>> packet".
>>>>>> If this behavior were a clearly spelled out requirement, rather than a
>>>>>> "typically" and if the text in 2.3 were replaced with something
>>>>>> realistic,
>>>>>> then the document would at least be itnernally consistent and match the
>>>>>> expected usages.
>>>>>>
>>>>>> Edge filtering: The text on edge filtering does not actually state that
>>>>>> prevention of packets with SRH and a current DA of an internal node is
>>>>>> mandatory.  Unless it is clearly stated, the security considerations
>>>>>> text as
>>>>>> currently written is significantly weakened.  If it is mandatory,
>>>>>> then again
>>>>>> the deployment section needs to note that an operator needs to verify
>>>>>> that
>>>>>> all of his edge devices support such filtering and have it properly
>>>>>> enabled
>>>>>> in order to use SRv6.
>>>>>>
>>>>>> Edge Filtering and hybrids: Other documents have talked about allowing
>>>>>> external packets with SRv6 entries pointing to internal nodes (which
>>>>>> means
>>>>>> the DA upon arrival at the operator edge will be an internal node as I
>>>>>> understand it).  It the intention is to permit that with appropriate
>>>>>> security, then the edge filtering requirements need to be clear about
>>>>>> the
>>>>>> requirements for cryptographic validation at the edge.
>>>>>>
>>>>>> Yours,
>>>>>> Joel
>>>>>>
>>>>>>
>>>>>> On 3/29/18 4:30 PM, Bob Hinden wrote:
>>>>>>>
>>>>>>>
>>>>>>> This message starts a two week 6MAN Working Group Last Call on
>>>>>>> advancing:
>>>>>>>
>>>>>>>          Title           : IPv6 Segment Routing Header (SRH)
>>>>>>>          Authors         : Stefano Previdi
>>>>>>>                            Clarence Filsfils
>>>>>>>                            John Leddy
>>>>>>>                            Satoru Matsushima
>>>>>>>                            Daniel Voyer
>>>>>>>          Filename       : draft-ietf-6man-segment-routing-header-11.txt
>>>>>>>          Pages          : 34
>>>>>>>          Date           : 2018-03-28
>>>>>>>
>>>>>>>
>>>>>>> https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header
>>>>>>>
>>>>>>> as a Proposed Standard.  Substantive comments and statements of support
>>>>>>> for publishing this document should be directed to the mailing list.
>>>>>>> Editorial suggestions can be sent to the author.  This last call will
>>>>>>> end on 12 April 2018.
>>>>>>>
>>>>>>> An issue tracker will be setup to track issues raised on this document.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Bob & Ole
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------------------------
>>>>>>> IETF IPv6 working group mailing list
>>>>>>> ipv6@ietf.org
>>>>>>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>>>>>>> --------------------------------------------------------------------
>>>>>>>
>>>>>>
>>>>>> --------------------------------------------------------------------
>>>>>> IETF IPv6 working group mailing list
>>>>>> ipv6@ietf.org
>>>>>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>>>>>> --------------------------------------------------------------------
>>>>
>>>> --------------------------------------------------------------------
>>>> IETF IPv6 working group mailing list
>>>> ipv6@ietf.org
>>>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>>>> --------------------------------------------------------------------
>>>>
>>>
>>> --------------------------------------------------------------------
>>> IETF IPv6 working group mailing list
>>> ipv6@ietf.org
>>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>>> --------------------------------------------------------------------
>>>
>>
>> --------------------------------------------------------------------
>> IETF IPv6 working group mailing list
>> ipv6@ietf.org
>> Administrative Requests: https://www.ietf.org/mailman/listinfo/ipv6
>> --------------------------------------------------------------------
>