Re: [tsvwg] New Version Notification for draft-herbert-tcp-in-udp-00.txt

Michael Welzl <michawe@ifi.uio.no> Sat, 19 August 2023 15:00 UTC

From: Michael Welzl <michawe@ifi.uio.no>
Message-Id: <9DD1F7A9-8087-4898-9618-802FDBDA4607@ifi.uio.no>
Content-Type: multipart/alternative; boundary="Apple-Mail=_D9E89932-4A1D-455B-9A3F-E54D647146E0"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\))
Date: Sat, 19 Aug 2023 17:00:22 +0200
In-Reply-To: <223D67DE-A10D-4076-93BD-34A000FE284C@gmx.de>
Cc: Tom Herbert <tom@herbertland.com>, tsvwg <tsvwg@ietf.org>
To: Sebastian Moeller <moeller0@gmx.de>
References: <169179236696.36797.6075120394432124931@ietfa.amsl.com> <CALx6S36-4d=48UMKusabbRnQiZ7B=0uTvd-Oksrnwj9bxN7xmg@mail.gmail.com> <579B1F7E-CE8C-47C5-94A8-39BE643C5796@ifi.uio.no> <CALx6S34aQ+cX--1OAs_TzjUxL2GwN2-5iigYegxWAzwv+_rR4A@mail.gmail.com> <D627976D-82FC-4C51-A983-FB724EFADC5B@ifi.uio.no> <CALx6S37ucLeXZUT-wKBHgpPqSiicDu197ai7QXphhxQDua45=Q@mail.gmail.com> <FC6D5711-1FF7-4429-84EC-76782017ED8F@ifi.uio.no> <CALx6S35eX9Ew5Z5RcGDpDX6N+b2RFkLkGUXf=56=ZRMRb7siiw@mail.gmail.com> <A1B6D204-3077-4CE9-88DD-ECF8D9752424@ifi.uio.no> <CALx6S37GcEXBu_UOv6TvgH6wAmPd0V_R7_t36yYmpCXLCD7Z4A@mail.gmail.com> <5FFA9884-F52C-421B-ACB1-D8C0517A87D6@ifi.uio.no> <C062D64D-1166-41FA-BCE0-20C53D0C9DAB@gmx.de> <DA8D8C24-E7A8-43B3-AD8B-3D12E891708E@ifi.uio.no> <CCB644C8-87C6-4A36-BA09-DC0D8D494665@gmx.de> <B5039087-48C8-4E26-9811-15443F689ABA@ifi.uio.no> <223D67DE-A10D-4076-93BD-34A000FE284C@gmx.de>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/V9XsluwDPAW6ujIxwALetEuAnnA>
Subject: Re: [tsvwg] New Version Notification for draft-herbert-tcp-in-udp-00.txt
Precedence: list

Hi !

I’ll point out that we have converged on what the draft should say, and cut + paste this up here:

>> It lets endpoints make a conscious decision between load balancing and cc coupling, and, when used, it increases the chance for NATs to keep their state intact. What’s not to like?
> 
> 	[SM] Oh, I think we might agree more than it looks, I think the draft could simply recommend to use a "fixed" source port if a coupled CC is used, otherwise making the src port reflect the underlying TCP flow identity should be the recommended action. 

I’m ok with that - so we could stop here.
I will draw an ornamental line to make that clear.

=======================================================================================
§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§
* * * * * * * * * * * * * * * * *                               * * * * * * * * * * * * * * * * * * * * *                               * * * * * * * * * * * * * *
§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§
=======================================================================================

However, just cutting off the conversation like this seems impolite to me - so I’ll still try to summarize the points below, maybe then we can get a more constructive outcome of the discussion altogether.
Since we both now have agreed on wording, your arguments against using the same UDP 5-tuple for multiple connections must be related to my original, somewhat stronger proposal, of the same 5-tuple being the default rather than the exception. Well, I don’t insist on this, but let’s take it from there anyway. I asked for reasons *against* using the same 5-tuple, and you wrote:

>	[SM] I thought that was clear, for a flow queuing scheduler to work best it needs to see individual flows… 
and:
>	[SM] Well, if this travels over a fq scheduler the whole tunneled traffic will appear as a single flow and under congestion (and that is what this all about) it will only get a single flow's share of bottleneck capacity... that is a disadvantage for coupled CC traffic, and it also counteracts the actual flow isolation at the scheduler (which, assuming your coupled CC scheduler is decent might not matter that much).

….to which I say: are you telling me that, if I open 10 connections and you open 1, I *should* get 10 times more capacity than you?

What a flow queuing scheduler can do, for multiple separate flows originating from the same host, is to protect them from each other. However, this is a way of making a network element do the Operating Systems’s job - the host is in a much better position to get this right, and it has more information available.  “Fixing” things in-between a host’s own flows in the network is really the wrong place to do it, as it causes pathologies like the capacity share being a function of the number of open connections. That’s what the congestion manager proposal tried to fix so many years ago (http://www.nms.lcs.mit.edu/cm/ <http://www.nms.lcs.mit.edu/cm/>, and 3124).   “Individual flows” at the network layer should ideally be one per host, not one per application.  So, if anything, lumping more connections together under the same 5-tuple makes flow queuing schedulers work *better*, not worse!  (also, perhaps a minor point: fewer “flows” (tuples) = fewer hash collisions).

Now, just for completeness, we can discuss the research that you think would be advisable - also because I do appreciate the request for data to prove a point, in general.

1) My misunderstanding:
===================
I thought you meant that it would be advisable to investigate single-path coupled cc in the face of multiple network bottlenecks. Investigating that is what I called “nonsense”, but I now understand that this is not what you meant. Sorry!

2) What you really meant:
====================
You have made it clear that you’re not convinced that traversing different paths necessarily also means traversing different bottlenecks. I agree with that!  I’ll quote your suggestion:

"start by pretending fate is shared, and this will work more or less well for short flows as well, assuming that this speculative initial fate-sharing was correct or incorrect and whether coupled CC is tolerant to some participating flows not really sharing the same fate. That is why I ask for how important is that fate-sharing for coupled CC to work. Given the above, I am not convinced that load balancers actually are that much of a problem (unless the bottlenecks happen only after pathes split after the load balancers)."

Right; it’s not a bad idea!  Is this research worthwhile to do **in support of the "TCP-in-UDP uses the same 5-tuple idea”**, however?  I say no, because:
(note, the asterisks stress the focus on this design idea alone - please bear with me: I do think it’s relevant research in a more general sense, see item 3 below).

a) such an approach will never be 100% reliable, whereas using the same 5-tuple is (100% here meaning: “yielding the same behavior as seen by a single cc instance today”), at no perceivable disadvantage (see above for why I think you’re wrong about flow queuing schedulers).

b) if the point is to convince people, with data, that it would work, then I already have the experience from the RFC 9040 discussions that such data wouldn’t convince e.g. Google (and probably, similarly, it wouldn’t convince other big companies). We didn’t have that data back then, but it just became clear that, with or without data, they wouldn’t want such a more complex machinery that *might* sometimes fail in their servers (remember, this is a sender-side operation).

c) such an approach will necessarily have to be more conservative than a design where one can rely on the same 5-tuple. Using my example again, an existing flow could have a cwnd of e.g. 100 packets and when a new flow joins, it could even be assigned e.g. 90 of these 50 in one go, depending on how priorities are set. That’s a massive leap of the congestion window, which is surely too risky when one cannot be really certain about sharing the same bottleneck.

3) Is research on “single-path coupled cc on traffic that **may** not actually traverse the same bottleneck worthwhile, in general?
=================================================================================================
Simply yes. Mainly, I see interesting possibilities for the outgoing traffic of a household: even when it goes to different destinations, it might all share the same bottleneck, and perhaps even congestion controllers on different hosts could be coupled (since the latency within the household should be very low).  That could yield quite large gains.  But now we’re talking about highly experimental research ideas, quite far from the engineering that TCP-in-UDP is, and this is in fact one of my project proposals that never got funded… so…   this is not happening, at least not for me. If someone else wants to do it and is interested in collaborating, get in touch  :-)

Altogether, many thanks for your interest and the inspiring points you shared; I hope that I managed to clear things up a little.

Cheers,
Michael

> On Aug 18, 2023, at 3:43 PM, Sebastian Moeller <moeller0@gmx.de> wrote:
> 
> 
> 
>> On Aug 18, 2023, at 11:07, Michael Welzl <michawe@ifi.uio.no> wrote:
>> 
>> 
>> 
>>> On 18 Aug 2023, at 10:15, Sebastian Moeller <moeller0@gmx.de> wrote:
>>> 
>>> Hi Michael,
>>> 
>>> 
>>>> On Aug 18, 2023, at 09:59, Michael Welzl <michawe@ifi.uio.no> wrote:
>>>> 
>>>> 
>>>>> On 18 Aug 2023, at 08:24, Sebastian Moeller <moeller0@gmx.de> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Aug 17, 2023, at 21:18, Michael Welzl <michawe@ifi.uio.no> wrote:
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Aug 17, 2023, at 9:15 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Aug 17, 2023, 12:09 PM Michael Welzl <michawe@ifi.uio.no> wrote:
>>>>>>> Hi !
>>>>>>> 
>>>>>>> About the flow label:
>>>>>>> 
>>>>>>> 
>>>>>>>> Within the network, the flow label serves the same function as how devices are using the ports in UDP encapsulation- in both cases they are use to mark packet as belonging to the same flow.
>>>>>>>> 
>>>>>>>> A "flow" in this context is purposely ill defined, it does not have to correspond one to one to a transport flow. So you're idea of combining TCP flows into a mega flow for purposes of network visibility is a valid use case;
>>>>>>> 
>>>>>>> … but it doesn’t work. Some routers do hash over transport ports + flow label + IP addresses (and who knows what else), and so we saw that, between the same host pair, packets using different ports but the same flow label can take different paths.
>>>>>>> 
>>>>>>> That's up to the router. Some routers do you flow labels, some packets don't even have port numbers or they're too deep in the packet.
>>>>>>> 
>>>>>>> We only need to define how things like flow label and port numbers are set, not how they must be used by intermediate nodes.
>>>>>> 
>>>>>> Well yes, but because of that, one just cannot rely on the flow label alone as a way to “pin down” the route. Equal UDP ports for different encapsulated TCP connections *are* needed for this to work. Combined congestion control is about traversing the same bottleneck.
>>>>> 
>>>>> 	[SM] Why? The endpoint running the connection manager surely can aggregate different TCP connections into one shared cwin aggregate, no? After all the flows need to start and terminate at the same IP addresses so will be identifiable... as far as I can see same outer tunnel flow ID can be a helpful shortcut, but seems not to be a strict requirement for coupled CC?
>>>> 
>>>> The reason is that a single path is only guaranteed (as much as it’s “guaranteed”, and hence assumed by, all single-path congestion control - of course paths can still change, etc.) when packets have a common 5-tuple. Indeed we put multiple connections together into a shared cwnd aggregate, but this only makes sense if they traverse the same bottleneck.
>>> 
>>> 	[SM] Well, perfect being the enemy of good (enough),
>> 
>> That’s not what this is:
>> 
>> 
>>> so this looks like a field were more research is advisable.
>> 
>> No, because it’s just totally wrong. Look, out of 3 packets, one can traverse bottleneck 1, one can traverse bottleneck 2, one can traverse bottleneck 3. A single congestion control instance just doesn’t make any sense for that, and research on nonsense is not advisable.
> 
> 	[SM] Yes, such divergent paths seems theoretically possible, my question is how likely is this scenario, given that a considerable number of internet users are mostly limited by their own internet access (so the bottleneck will be already predicted by NATed IPv4 and IPv6 prefix)... I might be wrong, but I think what mainly determines a flow's cwin and cwin's dynamics over a congested/limited path is the bottleneck capacity share of that flow and the RTT, the actual endpoint should not really matter all that much. So your quest of avoiding load balancing really just serves as a proxy for these flows share a common bottleneck, correct? 
> A load balancer that happens on either side of the bottleneck should not really matter (unless it affects the RTT, but that should be trivial to check, after all TCPs need to maintain individual RTT estimates, no?).
> 
> I respectfully maintain, that more research seems desirable about how coupled CCs operate under "normal" existing-internet conditions.
> 
> 
> 
>> 
>> 
>>> So how does coupled CC work when the assumption "single-path" is not fully correct. Which as you state is never fully guaranteed anyway.
>> 
>> And, load balancing is happening plenty when ports are different - surely not hard to dig up measurement papers that show this.
> 
> 	[SM] How prevalent load-balancing is not my question, my question is how much does a realistic level of load balancing compromise the utility of coupled CC. This is IMHO a relevant research question that proponents of coupled CC might want to consider. The answer might well be that this is catastrophic and hence fully deterministic shared outer flow id is required. I expect however that this will take more for coupled CC to loose its usefulness (given my limited understanding on what should affect cwin dynamics).
> 
> 
> 
>> 
>> Here’s a different angle to this: RFC 9040 is about coupling information across connections too, but not at the same level as coupled cc (instead, only to initialize). We (authors) tried to lobby for more coupling because this is beneficial when it works, and colleagues from Google were strongly opposed to this because of load balancing, and the reality that “connections with different ports take different paths”.
>> 
>> So, quite simply, without the same ports, we really can’t do this, period.
> 
> 	[SM] Which I, again with all respect, am not convinced of. Unless you already tried and it failed, in which case I will follow the data.
> 
>> 
>> 
>>>> An alternative to using the same 5-tuple is to measure whether there is a common bottleneck - we have also done work on this. Our latest and most thorough paper on this topic is:
>>>> David Hayes, Michael Welzl, Simone Ferlin, David Ros, Safiqul Islam: "Online Identification of Groups of Flows Sharing a Network Bottleneck", IEEE/ACM Transactions on Networking 28(5), pp. 2229-2242, Print ISSN: 1063-6692, Online ISSN: 1558-2566 October 2020. DOI 10.1109/TNET.2020.3007346.
>>>> https://ieeexplore.ieee.org/document/9161279?source=authoralert
>>>> Preprint: https://folk.universitetetioslo.no/michawe/research/publications/sbd_ton.pdf
>>>> 
>>>> … and there’s also RFC 8382.  However: this is not fully reliable, and it requires connections to be relatively long - which is perhaps appropriate for WebRTC (which RFC 8382 was written for), but is not at all the case for most other Internet traffic. With (and only with) a common 5-tuple, single-path coupled cc. can be instantly applied.
>>> 
>>> 	Maybe... I think a common 2-tuple (src/dst address) will already be quite deterministic, the question is, is this not already good enough for coupled CC to deliver on its promises? Say, start by assuming fate sharing by 2-tuple and run the "Online Identification of Groups" to confirm whether that initial decision was good enough or not... if not, de-share the congestion control again?
>> 
>> That’s exactly the argument that didn’t fly for RFC 9040. See my next statement for a reason:
>> 
>> 
>>> That said, for a fully coupled CC world an FQ scheduler would essentially operate on 3-tuples, something that has been argued as a suitable "flow-granularity" for deeper network nodes... but it really puts the burden on the coupledCC implementation to not screw things up regarding flow mixing and inter-flow scheduling.
>> 
>> Coupled cc won't get enough information to ever be able to do the right thing like this for short flows, when these flows (as is the case for the large majority of Internet connections) terminate in slow start, without experiencing congestion. Yet, without coupled cc., they may easily waste more round-trips than would have been needed. It’s really nothing that more research can fix.
> 
> 	[SM] As I said start by pretending fate is shared, and this will work more or less well for short flows as well, assuming that this speculative initial fate-sharing was correct or incorrect and whether coupled CC is tolerant to some participating flows not really sharing the same fate. That is why I ask for how important is that fate-sharing for coupled CC to work. Given the above, I am not convinced that load balancers actually are that much of a problem (unless the bottlenecks happen only after pathes split after the load balancers).
> 
> 
>> 
>> On the other hand, why are you even opposed to using the same 5-tuple?
> 
> 	[SM] I thought that was clear, for a flow queuing scheduler to work best it needs to see individual flows... 
> 
>> It’s reliable, easy with TCP-in-UDP, and I can’t see any disadvantage with it anyway.
> 
> 	[SM] Well, if this travels over a fq scheduler the whole tunneled traffic will appear as a single flow and under congestion (and that is what this all about) it will only get a single flow's share of bottleneck capacity... that is a disadvantage for coupled CC traffic, and it also counteracts the actual flow isolation at the scheduler (which, assuming your coupled CC scheduler is decent might not matter that much).
> 
> 
>> 
> 
> 
>> 
>> 
>>> Tangent: for home networks one of cake's recommended isolation-modes is one where first capacity is shared equitably between active internal IP addresses and only then (within each IP's capacity share) based on 5-tuple flows. That mode would give coupled CC meta-flows a more "equitable capacity share" than a pure 5-tuple flow isolation. That however is so far unique to cake and fq_codel does not implement that at all.
>> 
>> What a handful of devices do is irrelevant. Even if many devices would do it, it would be irrelevant:
> 
> 	[SM] You are missing my point, I think. This tangent shows how coupled CC does not need to suffer unduly even on a fq-scheduler assuming that scheduler does not do strict capacity sharing based on 5-tuple information.
> 
> 
> 
>> as long as there is a non-negligible number of routers out there that carry out load balancing using the 5-tuple, one cannot use single-path cc. coupling with multiple ports.
> 
> 	[SM] Assuming that coupled CC can not tolerate the expected level of load balancing (where the load balancing needs to happen before bottlenecks). I wonder for an on-path bottleneck, does path diversion after the bottleneck really matter? 
> 
> 
> Regards
> 	Sebastian
> 
> 
>> 
>> Cheers,
>> Michael

[tsvwg] Fwd: New Version Notification for draft-h… Tom Herbert
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Tom Herbert
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Tom Herbert
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… C. M. Heard
Re: [tsvwg] New Version Notification for draft-he… Sebastian Moeller
Re: [tsvwg] New Version Notification for draft-he… Tom Herbert
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Tom Herbert
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Tom Herbert
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Sebastian Moeller
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Sebastian Moeller
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Sebastian Moeller
Re: [tsvwg] New Version Notification for draft-he… Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… Sebastian Moeller
[tsvwg] On coupled CC Sebastian Moeller
Re: [tsvwg] On coupled CC Michael Welzl
Re: [tsvwg] New Version Notification for draft-he… touch@strayalpha.com
Re: [tsvwg] New Version Notification for draft-he… Tom Herbert
Re: [tsvwg] On coupled CC Christian Huitema