Re: [tsvwg] ECN encapsulation draft - proposed resolution

Markku Kojo <kojo@cs.helsinki.fi> Tue, 22 June 2021 14:02 UTC

Return-Path: <kojo@cs.helsinki.fi>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A74F33A2629 for <tsvwg@ietfa.amsl.com>; Tue, 22 Jun 2021 07:02:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.helsinki.fi
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NiJ-Xa7BDqEO for <tsvwg@ietfa.amsl.com>; Tue, 22 Jun 2021 07:02:27 -0700 (PDT)
Received: from script.cs.helsinki.fi (script.cs.helsinki.fi [128.214.11.1]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7EC873A2654 for <tsvwg@ietf.org>; Tue, 22 Jun 2021 07:02:18 -0700 (PDT)
X-DKIM: Courier DKIM Filter v0.50+pk-2017-10-25 mail.cs.helsinki.fi Tue, 22 Jun 2021 17:02:12 +0300
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cs.helsinki.fi; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version:content-type:content-id; s=dkim20130528; bh=74hHJg NV3aFwJD0SSK3tm7FljAJP7YjEukKixQ4VjAk=; b=bnwYe1nzmaBN52BGdcA7va Ae8Glqx5Q7uh/D3CEDf2gBEeNKU5dv4u5yEXVCT8r7WU0mJPlGuPyBX0GeF3soP2 iBOgnaBIyQ//7wWwsJAjoS1T33cAzRjSXIRdl2EtaIuP01MVfL3Frzihr+j0kYT+ cgX+f5AJqKO008xwQmXZ8=
Received: from hp8x-60 (85-76-2-69-nat.elisa-mobile.fi [85.76.2.69]) (AUTH: PLAIN kojo, TLS: TLSv1/SSLv3,256bits,AES256-GCM-SHA384) by mail.cs.helsinki.fi with ESMTPSA; Tue, 22 Jun 2021 17:02:11 +0300 id 00000000005A1C6D.0000000060D1ED63.00004C37
Date: Tue, 22 Jun 2021 17:02:09 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: Bob Briscoe <ietf@bobbriscoe.net>
cc: Jonathan Morton <chromatix99@gmail.com>, Donald Eastlake <d3e3e3@gmail.com>, John Kaippallimalil <kjohn@futurewei.com>, Markku Kojo <kojo=40cs.helsinki.fi@dmarc.ietf.org>, "tsvwg@ietf.org" <tsvwg@ietf.org>
In-Reply-To: <3a66effa-9269-a9b0-48e8-d48bd46d70d1@bobbriscoe.net>
Message-ID: <alpine.DEB.2.21.2106221542420.4160@hp8x-60.cs.helsinki.fi>
References: <MN2PR19MB40454BC50161943BC33AAAD783289@MN2PR19MB4045.namprd19.prod.outlook.com> <43e89761-d168-1eca-20ce-86aa574bd17a@bobbriscoe.net> <de8d355d-08b6-34fb-a6cc-56755c9a11ee@bobbriscoe.net> <MN2PR19MB4045DB9D2C45066AEB0762DB83259@MN2PR19MB4045.namprd19.prod.outlook.com> <alpine.DEB.2.21.2106021717300.4214@hp8x-60.cs.helsinki.fi> <BE497F82-5452-41A1-943F-7ABD0048C7F9@gmail.com> <56c2887b-5e9e-c2b6-c760-81e2627400a2@bobbriscoe.net> <3a66effa-9269-a9b0-48e8-d48bd46d70d1@bobbriscoe.net>
User-Agent: Alpine 2.21 (DEB 202 2017-01-01)
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-19536-1624370532-0001-2"
Content-ID: <alpine.DEB.2.21.2106221603220.4160@hp8x-60.cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/Sqn0g89rvTuuocv9zY7ROIYLPXs>
Subject: Re: [tsvwg] ECN encapsulation draft - proposed resolution
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Jun 2021 14:02:33 -0000

Hu Bob,

I am replying separately to the messges you addressed to me. I am not 
going analyse the maths by you or Jonathen more deeply as they 
both contain errors. Just pointing out a few things below which might 
help understanding my points in my other replies which do not contain 
much math.

Please see inline.

On Tue, 22 Jun 2021, Bob Briscoe wrote:

> Jonathan,
>
> Just checking that you have seen my response below. I spent a couple of days 
> trying to write it in the way that you seem to be thinking about this. So 
> we'll need you to respond if we are going to move forward. I appreciate 
> there's a lot here, so it needs some time to work through it all.
>
>
> Bob
>
>
> On 15/06/2021 23:25, Bob Briscoe wrote:
>> Jonathan,
>> 
>> I waited before replying to your email below until I had the time to try to 
>> unpick and deeply understand the way you think about all this. I would 
>> appreciate it if you would do likewise, and certainly please do not snip 
>> anything I say (like you did last time I put in a similar amount of effort 
>> into explaining all this on your terms). Instead, please try to understand 
>> and address each point.
>> See [BB] inline.
>> 
>> On 04/06/2021 02:02, Jonathan Morton wrote:
>>>> On 3 Jun, 2021, at 2:41 pm, Markku Kojo 
>>>> <kojo=40cs.helsinki.fi@dmarc.ietf.org> wrote:
>>>> 
>>>> I'm afraid there is an open/ongoing discussion on this where we have not 
>>>> reached concensus.
>>> I think I agree with your analysis that preserving the number of bytes 
>>> marked is probably the wrong thing to do.  The implementation complexity 
>>> argument is fairly compelling in itself, particularly noting that there 
>>> isn't any known "running code" which would place some upper bound on what 
>>> the complexity is.
>>> 
>>> By contrast the RFC-3168 approach of "if any part of the packet is marked, 
>>> all of it is marked" is very simple to implement, and this is proved by 
>>> actual implementations in the wild.  This approach is also consistent with 
>>> the behaviour when ECN is not in use, since if any fragment of a packet is 
>>> lost, the packet can't be reassembled and is discarded in whole.
>>> 
>>> I'd like to try and back this position up with some mathematical rigour 
>>> from a different direction.
>>> 
>>> One thing I noticed about Bob's position is that he's trying to preserve 
>>> the number of bytes marked, but usually the transport is actually 
>>> sensitive to the number of bytes, or possibly the interval of time, 
>>> *between* marks.  These are two (or three) very different quantities, and 
>>> it's important not to get them confused in the debate.
>>> 
>>> In engineering, a very useful tool for checking whether an answer is 
>>> correct is "dimensional analysis".  Simply put, work out the units that 
>>> your inputs and outputs are most naturally expressed as, and check that 
>>> the formulae linking them are consistent with respect to the units.  In 
>>> this case, we can look at the units of the AQM's marking function, the 
>>> units of the transport's congestion control steady state criterion, and 
>>> whether each approach to resolving ECN marks at reassembly preserves the 
>>> quality of the marking with respect to those units.
>>> 
>>> For the purpose of this analysis, we are looking for the quantity that is 
>>> most nearly preserved if we hold the AQM state constant or the transport 
>>> in steady state, but vary packet size and/or link bandwidth arbitrarily.
>>> 
>>> Starting with the AQMs:
>>> 
>>> 1: Codel is a time-domain AQM; its marking function is explicitly a 
>>> frequency, expressed for performance reasons as a time interval between 
>>> marks.  For analysis, we can equivalently express this quantity in units 
>>> of marks/sec or its reciprocal, seconds per mark.  If we change the 
>>> bandwidth or the packet size while holding Codel's control law constant, 
>>> the number of bytes in marked packets and/or the number of packets between 
>>> marks will change, but the time between marks will stay constant.
>>> 
>>> 2: A packet-domain marking function (as used in BLUE and many RED-family 
>>> AQMs) maintains a probability that any given packet will be marked.  
>>> Amortised over a large interval, we can express this as a (fractional) 
>>> number of marks per packet, or equivalently as packets between marks. If 
>>> the packet rate increases, so will the number of marks per time interval; 
>>> if the packet size decreases, so will the number of bytes between marks.
>>> 
>>> 3: A byte-domain marking function, as Markku described for early RED work, 
>>> is obtained by taking a nominally packet-mode probability function, but 
>>> multiplying the probability by bytes/packet (and a constant scale 
>>> factor).  The resulting unit is (fractional) marks per byte, or bytes 
>>> between marks.
>> 
>> [BB] One doesn't see flow rate formulae for Reno (or DCTCP, Cubic, etc) 
>> that say, "the rate is x if over PIE; but the rate is y if over CoDel". The 
>> outcome solely depends on the transports involved, which converge on a set 
>> of rates that fill the capacity irrespective of which AQM is at the 
>> bottleneck. Therefore the unit that a particular AQM uses to mark packets 
>> does not alter the steady-state outcome. This is a critical point, so if 
>> you disagree, there's no need to move on to the next part about transports 
>> - please stop here and solely explain why you think this point is wrong.

[MK] Actually the flow rate should not depend on the AQM, but in practise 
it does. This is because an AQM is used to control the (virtual) queue 
size on the bottleneck AQM node. It depends on how the AQM is configured, 
that is, at which queue level (queue size) it drops/marks the first packet 
after entering its main operating range as well as how it drops/marks 
after that.

If an AQM drops/marks a packet (on avarage) at a different the queue level 
(queue size) depending on the size of the packets, then the outcome 
depends very heavily on the AQM. Just think a single flow. The queue size 
at the time when the AQM marks/drops a packet controls directly what is 
the cwnd size when the congestion signal arrives. Thereby, a lower queue 
size means lower cwnd size and also lower cwnd size after halving 
the cwnd and thereby lower avg cwnd that results in lower flow rate (if 
cwnd has lower value than BDP after it was halved; if it does not, then 
the AQM has been configured more towards buffer bloat than a reasonable 
virtual queue size). In Reno formula this means that the queue size 
affects the avarage RTT in the Reno formula as well as 'p'.

>> 
>> The fact that CoDel inserts an amount of time between marks is a 
>> distraction. CoDel's control law decreases the time between marks until it 
>> balances with the transports using it.

[MK] Please stop here thinking a single flow case. CoDel's control law is 
never applied with a single flow in congestion avoidance beçause after the 
first mark/drop cwnd is halved and the sojourn time falls below Target. 
Next time a packet is marked/dropped only after several RTTs once the 
cwnd has been increased enough and the queue level (size) has reached the 
point where CoDel marks/drops a packet (i.e., once sojourn time has been 
above Target for a period of Interval). The decreasing marking/dropping 
interval in CoDels control low allows it to react faster when there are 
enough flows sharing the bottleneck such that a single drop/mark does not 
take sojourn time below Target.

>> The fact that RED, BLUE, PIE etc use 
>> a packet marking probability is also a distraction. They also adapt, 
>> increasing the marking probability to converge on 'the same outcome' as 
>> CoDel would, if placed in the same location with the same traffic sources.

[MK] RED, BLUE, PIE etc. with same configuration should also mark/drop a 
single packat per Std Track congestion control sawtooth cycle. So, the 
timing (number of bytes, not packets) between marks/drops matters.
If you decrease the timing, it results in decreased flow rate. And if 
small packets have the same probability to get marked/dropped as large 
packets, the mark/drop occurs earlier, that is, in shorter time, for 
small packets than for large packets.

>> By 'the same outcome', I mean that if you replaced one AQM at the 
>> bottleneck with another, the outcome would be:
>> * the same number of marked packets per second (or equivalently the same 
>> time between marks)
>> * AND the same marking probability.
>> 
>> If it existed, byte-mode marking (the early RED option) would have the same 
>> property, but let's only talk about that if we need to, when we get to 
>> re-framing, where it does make a difference.

[MK] Byte-mode marking gives lower probability for small packets, so it 
postpones marking a small packet such that (on the average) the mark 
occurs approximately after same period of time (after same number of 
bytes) regardless of packet size.

[SNIP]

>> ==Errors in your analysis of transports==
>> 
>> Your analysis of DCTCP is OK, but your analyses of Reno and Cubic contain 
>> errors:
>> 
>> * Reno: Given you define BDP in bytes, your criterion for RTTs per mark 
>> should be (BDP/MSS)/2 , not BDP/2.
>>     and a picky point: the const should be approx 2/3 not 1/2.

[MK] Incorrect. The const depends on the amount of (virtual) queueing 
configured and number of flows. If the bottleneck (with or without AQM) 
is configured to allow queue that equals to BDP (i.e., mark occurs when 
BDP amount of bytes have been queued), then a single flow gets full 
utilization of the bottleneck and the max cwnd = W = 2*BDP and cwnd after 
cwnd reduction is W/2. So, the time needed to increase cwnd from W/2 
(=BDP) to W is W/2 = (2*BDP/MSS)/2 = BDP/MSS RTTs. If the bottleneck is 
configures to have no queue, the time needed to increase cwnd from BDP/s 
to BDP is (BDP/MSS)/2 RTTs.

And, if the number of flows is increased, then the time between the marks 
decreases. So, we need to be careful in understanding what we are 
modelling. Otherwise, if we just play with the formulae, the outcome is 
just crap.

Best regards,

/Markku

>> * Cubic: Obviously, your two formulae compared below cannot both be true 
>> (by dimensional analysis, ironically):
>>     (BDP/MSS) = 1.054 * (RTT / p)^0.75
>>     (BDP/MSS) = 1.054 * (RTT*MSS / p)^0.75
>>     p is dimensionless (a fraction) so you cannot just divide it by MSS to 
>> "get a byte quantity", while not changing the other side of the equation.
>> 
>> 
>> ==Formula Transformations==
>> Here, we transform rate formulae in terms of (time between marks) into 
>> formulae in terms of (marking probability).
>> 
>> For brevity I'll use the following notation, all at steady state, and with 
>> units shown in [square brackets]:
>>     R : RTT [s/round]
>>     S : MSS [B]
>>     x : flow rate [B/s] = S/R * cwnd
>>     t_m : time between marked packets [s/marked_pkt]
>> 
>>     r : packet rate [pkt/s] = x/S by definition
>>     BDP : bandwidth-delay product [B/round]
>>     p : marking probability [], meaning a dimensionless fraction
>> 
>> The time between marks is the reciprocal of the marks per time:
>>     t_m = 1/(p*r)
>> where p*r is the marks per time, because r is packets per time, and each 
>> packet is marked with probability p.
>> Expressing this in terms of flow rate rather than packet rate:
>>     t_m = S / (p*x) (1)
>> 
>> We can then substitute the above in your formulae for rounds per mark:
>> 
>> ==DCTCP==
>>     mark per round = 2
>>     mark per time = 2 / R
>> Time per mark is the reciprocal,
>>     t_m = R/2
>> Substituting from eqn (1)
>>     S / (p*x) = R/2
>>     x = S/R * 2/p
>> which is the well-known DCTCP formula for flow rate, better known in terms 
>> of cwnd in units of [packet]:
>>     cwnd = 2/p.
>> 
>> ==Reno ==
>>    rounds/mark = 2/3 * BDP/S            (corrected from your formula)
>>     t_m = R* (rounds/mark)
>>            = 2/3 R*BDP/S
>>            = 2/3 R*R*x/S
>> Substituting from eqn (1)
>>     S / (p*x) = 2/3 R*R*x/S
>> Rearranging and collecting terms:
>>     x^2 = 3/2 S^2/(R^2 * p)
>>     x = S/R sqrt(3/2p)
>> which is the well-known Reno formula for flow rate, better known in terms 
>> of cwnd in units of [packet]:
>>     cwnd = sqrt(3/2p).
>> 
>> ==Cubic==
>> We can transform in the other direction from the Cubic formula that RFC 
>> 8312 expressed in p, to a formula expressed in t_m:
>>     (BDP/S) = 1.054 * (R / p)^0.75
>>     x*R/S = 1.054 * (R * t_m * x / S)^0.75
>> Rearranging and collecting terms:
>>     x^0.25 = 1.054 * S^0.25 * t_m^0.75 / R^0.25
>>     x = 1.054^4 * S * t_m^3 / R
>> 
>> In summary, the following generalized formulae can be used to express flow 
>> rate either in terms of p (marking probability) or in terms of t_m (time 
>> between marks):
>>     x = k * S / (p^b * R^c)
>>     x = k * S * t_m^d / R^e
>> where:
>>             b    c    d      e
>>     ____________________________
>>     DCTCP   1    1  infty  infty
>>     Reno   1/2   1    1      2
>>     Cubic  3/4  1/4   3      1
>> 
>> (the infty values are because DCTCP flow rate, x, does not depend on t_m or 
>> R, given t_m = R/2, with no dependence on x)
>> 
>
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
>
>