Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-23

Markku Kojo <kojo@cs.helsinki.fi> Mon, 10 July 2023 17:28 UTC
Date: Mon, 10 Jul 2023 20:27:53 +0300
From: Markku Kojo <kojo@cs.helsinki.fi>
To: rs.ietf@gmx.at
cc: Bob Briscoe <ietf@bobbriscoe.net>, Mirja Kuehlewind <ietf@kuehlewind.net>, tcpm@ietf.org
In-Reply-To: <eebc0dbc-1d32-3b52-2f04-c21eae1dbb39@gmx.at>
Message-ID: <5ec15dcf-7fdc-dd1b-f43d-948c38a09c32@cs.helsinki.fi>
References: <7EC77745-BA44-4CA5-8B14-9430988B7510@fh-muenster.de> <alpine.DEB.2.21.2303260458560.4394@hp8x-60.cs.helsinki.fi> <2c3e57b6-68c3-6664-3034-410ee4379899@gmx.at> <f39cde47-aefe-4e2e-133a-c19ebd5f8dca@cs.helsinki.fi> <eebc0dbc-1d32-3b52-2f04-c21eae1dbb39@gmx.at>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=_script-27770-1689010075-0001-2"
Content-ID: <379549f-a8-9a79-e27d-e5631aa92749@cs.helsinki.fi>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/95_f4bHXIn8TwKY0Jle-wUpu-YE>
Subject: Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-23
Precedence: list
Hi Richard,

My apologies for taking so long before replying.

TL;DR

In the Sec 3.2.2.5.1, the Increment-Triggered ACKs rule includes steps 
that implement experimental behaviour. Hence, the steps should not be 
included in a document expected to be published as PS. Otherwise, any 
implementation following the rules defined in this document becomes an 
automaton for sending Acks of Acks if the other end decides to enable 
ECN-capable pure Acks for whatever reason.

It seems that we agree that sending Acks of Acks (or making pure Acks 
ECN-capable) is "MUST NOT" unless SACK is in use, instead of "SHOULD NOT" 
as specified in -24.

It seems that using TCP timestamps is not a viable way to distinguish 
genuine DupAcks from "fake" DupAcks (Acks of Acks). I and Bob seem to 
agree on this, Richard disagrees.

For details, please see inline tagged [MK2]

On Thu, 8 Jun 2023, rs.ietf@gmx.at wrote:

> Hi Markku,
>
> see inline with [RS]
>
>> Please find a few scenarios related to the problems with Acks of Acks 
>> available at:
>>
>>   https://www.cs.helsinki.fi/u/kojo/IETF/AccECN2023.pdf
>
>>> On Mon, 27 Mar 2023, rs.ietf@gmx.at wrote:
>> 
>> 
>> This is all clear. Given that the ECT-marked pure Acks are present only in 
>> experiments (ECN++ or maybe some others), why does this stds track document 
>> even mention them and the need to send Acks of Acks which are not part of 
>> the stds track protocol but only valid with experiments like ECN++  
>> Handling CE-marked pure Acks is clearly experimental and it seems too early 
>> and unmature to decide how to best convey feedback on them in a stds track 
>> protocol. Experimental protocol would be different but I don't want to open 
>> that discussion, wg has made its decision.
>
> [RS] Expirience has shown, that it is much harder to deploy a change in the 
> receiver side, than it is to deploy a unilateral change on the sender side. 
> Therefore, the reaction of the receiver (reflector for ECN marks) needs to be 
> defined more stringent - while also trying to be ready for possible future 
> extensibility of the signaling schemes for yet unforeseen purposes.

[MK2] My apologies but I do not quite follow. TCP wire protocol itself 
and the signaling protocol specified in this doc are both bidirectional 
protocols. If you need a change, there is no "receiver" end separately 
but both ends of a connection include both sender and receiver side 
implementation of this protocol as specified in this doc. So, it is 
equally easy or hard to change either the receiver or the sender.
If you refer to congestion control algos that are mostly sender only 
algos, I agree but with this signaling protocol I disagree. And, the 
behaviour we are arguing about is for experimental purposes only which 
means that an implementer should be prepared to change it anyway; it's 
much harder to remove the deployed standard code from stacks than 
experimental code in case the experiment fails, for example.

[MK2] So, the major problem I am concerned about is that the 
Increment-Triggered ACKs rule (in Sec 3.2.2.5.1) includes steps that 
specify experimental behaviour:

  "Increment-Triggered ACKs: An AccECN Data Receiver MUST emit an ACK
     if 'n' CE marks have arrived since the previous ACK. If there is
     newly delivered data to acknowledge, 'n' SHOULD be 2. If there is
     no newly delivered data to acknowledge, 'n' SHOULD be 3 and MUST
     be no less than 3. In either case, 'n' MUST be no greater than 7."

The third sentence is meaningful only when an endpoint is participating 
in the (upcoming) ECN++ experiment. However, an implementer following 
these rules will implement and deploy an automaton in the stack for 
sending these Acks of Acks and has no control on whether the stack 
participates in the experiment or not; as soon as the other end enables 
ECN-capable pure Acks the stack will automatically be part of any such 
experiment even though the implementer does not want the stack to 
participate in any such experiment for a valid reason (e.g., a 
battery-driven wireless device that does not want to send any extra 
packets to save energy or on expenses).
IMHO, this is very concerning because we have little experience on acking 
pure Acks in scale and if such behaviour turns out to be even more harmful 
than currently expected, it will be hard to get this code out from the 
wild. There has been proposals and in the future there may well be also 
other experiments in addition to ECN++ that enable ECN-capable pure Acks.
So, it may also happen that ECN++ experiment is otherwise successful but 
acking Acks turns out to be not useful nor harmful to significant extent, 
meaning that there is no immediate need to remove this code from the 
deployed base and it stays there even if ECN-capable pure Acks are 
decided to be disabled for ECN++.  Then, if years (decades) later it is 
decided to enable pure Acks again for some purpose (experiment?) other 
than ECN++, it is quite likely that no one recalls this Acks of Acks 
behaviour included in this standards track spec because all necessary 
(feedback) actions are redesigned for the new purpose. This would result 
in completely unnecessary Acks of Acks being sent around, and quite likely 
also without any safety procedures, i.e, also when using non-SACK 
connections which will bring out all the disadvantages that have been 
brought out in this thread (and maybe more).

[MK2] In addition, the Increment-Triggered ACKs rule is somewhat 
ambiguous and it would be useful to clarify it.

[MK2] First, the heading of Sec 3.2.2.5.1 is "/Data Receiver/ Safety 
Procedures"  and the Increment-Triggered ACKs rule starts with "An AccECN 
/Data Receiver/ MUST emit an ACK ...". This is fine with the second 
sentence:
"If there is newly delivered data to acknowledge, 'n' SHOULD be 2.", 
which clearly defines operation for a Data Receiver. However, the third 
sentence reads: "If there is no newly delivered data to acknowledge, 'n' 
SHOULD be 3 ...". This defines the sending of Acks of Acks but it is 
effectively the operation for a Data Sender in the first place because 
the primary sender of the Acks of Acks is the Data Sender that may 
receive CE-marked pure Acks that are acking its data segments (later, if 
the ping-pong effect realizes, the Data Receiver may also be the sender 
of Acks of Acks.

[MK2] Second, the draft uses an undefined term/concept "newly delivered 
data" which I don't recall seeing in any other TCP-related RFC. I believe 
"If there is newly delivered data to acknowledge" means "If a data 
packet marked CE arrives, 'n' SHOULD be 2." which would be unambiguous I 
think. Otherwise, someone might interpret it to mean e.g., "if there is 
unacknowledged data".

[MK2] Third, it would be useful to add a few sentences to clarify how 
these rules are intended to coexist with the TCP Delayed Ack mechanism.
They complement it, I believe.
The draft has Sec 2.3 that has "Delayed Acks" in its heading but it does 
not have anything to do with the the Delayed Ack mechanism. Maybe it would 
be useful also to refine the Sec 2.3 heading such that no one assumes 
that it would discus the Delayed Ack mechanism like I did?

[MK2] to sum up. I would suggest new text for the Increment-Triggered 
ACKs rule:
  "Increment-Triggered ACKs: When a data packet marked CE arrives,
     an AccECN Data Receiver MUST emit an ACK if 'n' CE marks have
     arrived since the previous ACK. 'n' SHOULD be 2 and 'n' MUST
     be no greater than 7."

That is, IMHO the rule for sending Acks for Acks should be in an 
experimental document of its own (if not in ECN++ doc).

> [RS] By defining a clear mechanism, which has benign impact even if abused (a 
> stream of CE-marked ACKs will trigger at most 1/3 counter-driven ACKs for the 
> CE marks; a pathological stream of alternating CE and non-CE may, if advice 
> in the draft is not heeded, trigger at most 1/2 change-triggered ACKs for 
> those CE marks; if these ACKs are then ECT-marked (experimental), and the 
> reverse path exhibits similar congestion markings, there may be a certain 
> amount of "ACK-ping-ping" but it's dampened and will quickly die down (a few 
> RTTs at most).
> 
> [RS] But to reiterate, the most extrem of such "dampend avalance" effects may 
> only show up in partial conformant AccECN implementations

[MK2] As I have tried to say, the (dampening) ping-pong effect of Acks of 
Acks is not the major concern even though it can be solved much more 
effectively by not acking CE-marked DupAcks which is always unnecessary 
or even harmful as I already described earlier (below). 
Or, can you possibly show a scenario where acking a CE-marked DupAck is 
useful instead of ignoring it (presumably also in the counters)? At 
least, the stringent reaction to such CE-marked DupAcks seems not mature 
enough for stds track but merely research. Or do we have some 
experimental evidence?

>>> Now, CE-on-ACK:
>>> 
>>> As long as the side receiving the CE-marked ACK has any more data
>>> outstanding to be sent - the incremented ACE counter will be
>>> piggy-backed on this new data packet. Further, the text in the AccECN
>>> draft is also written in a way that this incremented ACE counter does
>>> *NOT* need immediate reflection, but rather may increase by at least 3.
>>> 
>>> So, as long as either half-connection has more data to send, there is
>>> no risk of any ACK-on-ACK ping-pong, even when all the pure ACKs on one
>>> half-connection get CE marked.
>>> The only point in the evolution of a TCP session, where ACK-on-ACK could
>>> be expected is when
>>> 
>>> Host A no longer has any data (new, or retransmissions) to transmit
>>> Host B no longer has any data (new, or retransmissions) to transmit
>>> The Path for *both* Half-Connections experience severe congestion
>>> (100% CE marking probability on the pure ACKs).
>> 
>> This all clear but sounds like a bit limited view of the possible 
>> scenarios. There are several variations of bidirectional data transmission 
>> where sending data alternates between the end points, e.g., with a typical 
>> request-reply type of application behaviour. Please see my discussion on 
>> my  Fig. 2 towards the end.
>> 
>>> Since no new, or retransmitted data would be ready to piggy-back the
>>> changed ACE counters, I do not see how this rapidely dampening
>>> ACK-on-ACK scenario would have any impact on any of the mechansisms
>>> (Fast Retransmit doesn't trigger on the multiple idential ACK fields on
>>> data packets, when the receiver doesn't have data to send).
>> 
>> The dampening ping-pong effect is not a big concern but sending unwarranted 
>> dupAcks is. Please see Figure 1 of my example scenarios. It's a slight 
>> variation of the scenario that Bob pointed out on slide 6 here:
>>
>>   https://datatracker.ietf.org/meeting/111/materials/slides-111-tcpm-accurate-ecn-01
>> 
>> It also corresponds pretty much to the scenario you described in your 
>> follow-up message that I try to cover here as well (sorry for not copying 
>> it here). The major difference is in that neither the scenario on the slide 
>> 6 nor the one in your follow-up description continued far enough to fully 
>> understand all problems the Acks of Acks create.
>> > Now, let's look at my Fig 1 and assume no SACK nor Timestamps are in use, 
>> so the TCP senders employ NewReno.
>
> [RS] Then neither Host A nor B are supposed to send out ECT-enabled ACKs in 
> the first place. Severe congestion on the Return Path would lead to ACK loss 
> rather than CE-marked ACKs, and no ACKs reflecting the changed CE counter 
> from A to B (as non-ECT pure ACKs don't change the counter)...

[MK2] Sure, if there are no ECT-enabled ACKs but I was commenting the 
current text that says "SHOULD NOT" which leaves the opportunity that 
Acks of Acks are sent out. If it is changed to "MUST NOT" then we don't 
need to worry for that part. If I understood correctly your comment later 
(below), then we agree that it should be changed to MUST NOT, right?

[MK2] However, keeping the current Increment-Triggered ACKs rule as is in 
this draft potentially results in sending Ack of Acks also with non-SACK 
TCP connections as I explained above.

>> A and B alternate in sending data. A starts with a window size of at least 
>> 18 MSS that is enough to trigger 9 cumulative Acks from B to A. The 
>> direction from B to A is (heavily) congested so that at least 9 pure Acks 
>> get CE marked and trigger at least 3 extra dupAcks from A to B that do not 
>> correspond to any data packet from B to A that have left the network as 
>> required by RFC 5681 (that is, this is a violation of the crucial 
>> requirement in RFC 5681 that is a prerequisite for correct operation of 
>> several stds track and other CC algos).
>> 
>> Once the first two dupAcks arrive at B they incorrectly fool Limited 
>> Transmit into sending two extra data packets even though no data packets 
>> have left the network (the justification for Limited Transmit to send new 
>> data packet relies on packet conservation principle and that a dupAck 
>> indicates a data packet has left the network as required in RFC 5681).
>> When the 3rd dupAck arrives at B, it triggers a false Fast Retransmit and 
>> fools B to enter Fast Recovery. The major problem of false Fast Retransmit 
>> is not the potential unnecessary reduction of cwnd that hurts the flow from 
>> B to A but the subsequent unnecessary retransmissions during Fast Recovery 
>> that hurt others by injecting unnecessary load into the network;
>> when the original Acks from A to B start flowing in, the TCP sender at B 
>> considers them as partial Acks and triggers an unnecessary rexmit for each 
>> original cumulative Ack, which flow in at the bottleneck line rate. 
>> Moreover, if there are more than 3 Acks of Acks (=dupAcks) each of them 
>> fool Fast Recovery into incorrectly increasing cwnd by one MSS and send out 
>> an extra new data segment, making B more aggresive than allowed (also due 
>> to violating "MUST NOT" in RFC 5681).
>> 
>> These Acks of Acks are problematic also in a slightly alternate scenario, 
>> where B first (before A) starts sending data (data pkts 1-100) but maybe to 
>> some extent simultaneously with A and B encounters a data packet loss and 
>> correctly enters loss recovery (Fast Recovery). In this case, once A starts 
>> receiving CE-marked pure Acks (Acks 22, 24,...) after the data packets from 
>> B, A will continue injecting extra dupAcks to B that will make it send more 
>> new packets during Fast Recovery than allowed, i.e., it causes harm to 
>> others by being too aggressive. If the path from B to A is highly congested 
>> such that B and other flows competing it have a relatively small cwnd of a 
>> few pkts only, then just a few Acks of Acks from A to B may result in B 
>> effectively not reacting to congestion at all during the fast recovery. 
>> Definitely something to avoid.
>
> [RS] Again, your scenario falls outside the scope of the document; while I 
> agree with the analysis, there are additional assumptions:
> - either a incorrect/ignorant implementation, where the Host sends out 
> ECT-marked pure ACKs, when neither SACK nor TSopt are available.

[MK2] Again, it is not outside of scope as long as the text says SHOULD 
NOT or timestamps are suggested as a solution to the problem (see my 
comments on timestamps below.

> - a middlebox, incorrectly, but selectively tweaking the "TOS" byte to make 
> the pure non-ECT-ACKs suddenly ECT-ACKs (or CE-ACKs; if that would be the 
> case for data segments, that path would be unusable with ECN) close to the 
> respective Host.
> (a receiver could also choose to ignore CE-marked pure ACKs from incrementing 
> the CE counter, and subsequent mechanisms to convey this change to the other 
> side, when the session is not SACK or TSopt enabled.

[MK2] I am afraid that an end-point cannot ignore CE-marked pure ACKs from 
triggering Acks of Ack when the text reads:
  "An AccECN Data Receiver /MUST/ emit an ACK
   if 'n' CE marks have arrived since the previous ACK."

> I believe this is one 
> aspect that could be added under a "MAY" in the draft. If memory serves, we 
> didn't want to have too many dependencies in the std-track document with 
> tangential mechanisms though.)

[MK2] I think it would be a much more effective and simple way forward to 
make this "MUST" a "SHOULD" and add a short explanation that an end point 
may choose not to send Acks of Acks.

> [RS] If the congestion is real, very likely a significant fraction of the 
> erranously retransmitted packet will get lost, and multiple rounds of cwnd 
> reduction will furthermore decrease the effective sending rate - after the 
> initial burst which, as you correctly point out, could harm other flows.

[MK2] Not anymore sure if you assume here pure Acks being ECT or not?
If CE-marked pure Acks are present, I'm afraid this is not generally 
correct either. When a path is congested such that several packets become 
CE-marked, it does not need to mean that it is pathologically congested 
such that the AQM starts dropping packets instead of marking. Many AQMs 
keep on marking with a very high marking rate when there are enough 
capacity-seeking flows and particularly when there are (also) newly 
started flows that slow start and thereby increase the queue rapidly and 
will react to congestion only too late due to slow-start overshoot. AFAIK 
most AQMs start dropping only long after they have practically been 
marking almost all packets.

[MK2] In addition, it is useful to understand the packet dynamics on a 
congested path. When a path is congested simply because there are enough 
capacity-seeking senders, it would mean that the path stays congested 
from one RTT to another, instead on being a one shot event ending once a 
flow has reduced its cwnd (even if it does it for multiple rounds).
This is because the reaction to congestion by one flow (or some flows) is 
quickly overtaken by the other flows. As long as there are several flows 
seeking for more capacity than what is available, the bottleneck stays 
congested and CE-marked data packets and pure acks are present from one 
RTT to another, resulting in the scenarios repeating continuously but 
each time affecting a different cwnd size for each of the flows.

>> Timestamps:
>> ===========
>> 
>> Let's next assume timestamps are in use. If I understand it correctly, the 
>> current proposal to avoid interpreting Acks of Acks as dupAcks is:
>>
>>   "If timestamps are in use, and the incoming pure ACK
>>    echoes a timestamp older than the oldest unacknowledged
>>    data, it is not a duplicate."
>> 
>> If we look at my Fig 1 and assume that timestamps are in use and the 
>> timestamp value for each data packet is the same as its sequence number, 
>> that is, the timestamp for data 0 from B to A has TSval=0, etc.
>> 
>> Now, let's first assume pure Acks are not ECT (Acks of Acks denoted as red 
>> do not occur) and there is a normal case with a packet loss from B to A 
>> (data pkt 1 is dropped). My apologies for not drawing this separately.
>> When the data pkts 2, 3, and 4 from B to A arrive at A, they trigger 
>> dupAcks that carry Ack=1 and TSecr=0 as per RFC 7323. The echoed timestamp 
>> TSecr=0 is older than the timestamp value of the oldest unacknowledged data 
>> (data 1 that has TSval=1). These are definitely dupAcks but the proposed 
>> rule above would declare them not being dupAcks and hence prevent the 
>> normal dupAck-based loss detection from working!?
>
> [RS] I believe this sentence erroneously contains a logical negation;

[MK2] Not sure if you are referring to the quoted text (from Bob) or my 
text Ok? If the quoted rule was erreneous that may explain part of the 
confusion but does not solve the problem (pls see below).

> [RS] With a hole in the received data sequence, TSecr will continue to 
> reflect back the TSval of the oldest observed continous data segment, as you 
> correctly point out;
>
> [RS] However, with all data received in-sequence, followed by pure ACKs with 
> newer timestamps, TSecr will reflect the TSval of the most recently observed 
> pure ACK.
>
> [RS] So,
> 	If timestamps are in use, and the incoming pure ACK echoes
> 	a timestamp more recent than the oldest acknowledged data,
> 	it is not a duplicate. If it is older than the oldest
> 	unacknowledged data, it is a duplicate.

[MK2] Hmm, these two rules (sentences) seem to be in conflict. In any 
TCP connection, all pure Acks that arrive after the cumulative Ack of the 
first data segment and acknowledge a higher sequence number than the Ack 
of the first data segment fulfill the condition in the first sentence, 
including all genuine DupAcks? Do you possibly mean "pure ACK echoes a 
timestamp more recent than the timestamp from the segment with highest 
sequence number that have been cumulatively acked, it is not a 
duplicate."?

[MK2] Anyway, the problem is not solved (see below).

>>> If we then assume pure Acks are ECT-marked in Fig 1, what would be the 
>>> TSecr in the three first dupAcks? What is the rule to set TSecr in Acks 
>>> that ack pure Acks and where is it specified? And, how would it prevent 
>>> the Acks of Acks from being interpreted as dupAcks? I am probably missing 
>>> something here. Please explain.
>
> [RS] See RFC7323; Section 4.2, rule (2).

[MK2] I believe you intended to point out Sec 4.3, rule (2)? (There are no 
"itemized rules in Sec 4.2)

> [RS] Pure ACKs (with "newer" timestamps; all the blue CE-marked Acks) will 
> update ts.recent when there is no hole in the received sequence space. Thus 
> the first red ACK (with updated CE counter) will have the TSecr 3, followed 
> by TSecr 6 and TSecr 9.

[MK2] Your comment above indicates how tricky and unobvious it is to 
deduce how TCP should operate here. Particularly when all TCP specs have 
been written with a well-known assumption that TCP does not ack pure 
ACKs, that is, the rules in specs do not consider the option of acking 
ACKs at all.

[MK2] If you make the interpretation as you did above, it literally 
and correctly follows RFC7323, Section 4.3, rule (2) but that rule was 
written for arriving /data/ segments only and is correct in that case. 
However, the rule does not consider the possibility of acking pure ACKs. 
Instead, we need to look into the text in RFC 7323 that explains the 
logic of the rules (the text before the rule (2), itemes (A)-(C). A pure 
ACK does not advance RCV.NXT, so it does not update TS.Recent. If it does 
as you suggest, it would mesh up the RTTM mechanisms, resulting in 
incorrect RTT measurements. If I understood it correctly, Bob agrees with 
me here.

> [RS] On a genuine loss, as you correctly stated, TSecr would remain at 0 for 
> these consecutive Dupacks, differentiating these two cases from each other.
>
> [RS] That quoted statement from draft -22 needs to be updated (in the ECN++ 
> draft).
> 
>> I didn't include a scenario of problems with Eifel detection because I am 
>> still not sure what is the TSecr value you "assume" is delivered with Acks 
>> of Acks? I also later noticed that the scenario I first had in mind was 
>> also possible without Acks of Acks if data packets are reordered similar to 
>> Acks of Acks in my scenario. So, probably it is only that Acks of Ack will 
>> make the problem to occur more often because small Acks are more likely to 
>> get reordered than larger data pkts, particularly if they are CE marked and 
>> operate in an L4S setting.
>> 
>> 
>> Spurious RTO detection
>> ======================
>> 
>> If we next look at the Fig 3 where Fig 3a) represents the normal behavior 
>> of F-RTO in a typical case which may result in spurious timeout that F-RTO 
>> is designed to detect. A and B alternate in sending data. B first sends 
>> several data packets (packets 2000 - 2000+n) that get all cumalatively 
>> acked. Then A continues by sending data (pkts 101 - 200) but due to 
>> (wireless) network conditions the acks get delayed such that A's RTO 
>> expires spuriously. A sets cwnd = 1, rexmits data pkt 101 and enters slow 
>> start. Because the RTO was spurious, the original ack (ack 103) arrives and 
>> increases cwnd by 1. A sends two new data pkts (pkts 200 and 201) as per 
>> F-RTO and waits until an ack arrives. The Ack that arrives next is the next 
>> original ack (ack 105). Beacuse this Ack also an acked new data, F-RTO 
>> declares RTO spurious, exits RTO recovery and continues by sending new 
>> data. Everything worked as designed.
>> 
>> In Fig 3b) we have the same setting but now pure Acks are marked ECT.
>> There is some congestion from A to B such that at least three of the 
>> cumulative Acks from A to B get CE-marked and trigger an extra dupAck from 
>> B to A even though no data pkt has left the network. This dupAck precedes 
>> the cumulative Acks from B to A and is delayed just like the cumulative 
>> Acks. A's RTO expires spuriously and it enters RTO recovery like in Fig 3a) 
>> but now the first ack to arrive at A is the extra dupAck.
>
> [RS] With a timestamp older than the RTO retransmitted data packet; A can 
> therefore deduce (when using the TSopt) that this ACK was not in fact ACKing 
> the RTO retransmission...

[MK2] First, the whole idea of F-RTO is to detect spurious RTOs without 
any options.
Second, I still cannot see what is the correct rule with timestamps that 
one should use in order to have F-RTO to make a correct decision. If there 
is such a rule, one would need to specify it and update RFC 5682.

>> Therefore, F-RTO incorrectly declares that the RTO was not spurious and
>> continues RTO recovery by retransmitting unacknowledged data in slow start.
>
> [RS] Only when no TS is used; SACK with DSACK would be another way of 
> detecting this, although delayed by one ACK in your example. With either of 
> these two (required to have ECT-enabled ACKs) options, there is no ambiguity 
> and this case can be resolved, the spurious RTO reverted...

[MK2] DSACK can detect spurious retransmissions but only after all 
unnecessary retransmissions have been sent and acknowledged. That is, it 
cannot avoid spurious retransmissions unlike F-RTO (and Eifel) and hence 
it is not an equivalent replacement for F-RTO. Instead, we have 
also SACK-Enhanced version of the F-RTO algo, but it does not work 
correctly with Acks of Acks as currently specified (the algo would be 
ambiguous to say at least).

> When the original Acks start flowing in at the bottleneck rate, the TCP 
> sender at A increases cwnd in slow start on each original Ack that arrives 
> and unnecessarily retransmits the next segments at higher speed than before 
> the RTO expired. Hence, it causes significant harm to the others by bursting 
> out unnecessarily the whole window of data. These unnecessary rexmits are the 
> major problem both with false Fast
>> Retransmits as well as spurious RTOs, not the potentially unnecessary 
>> reduction of cwnd that harms the flow itself. That is also why the F-RTO 
>> and Eifel algos sepateted the spurious rexmit detection from the response 
>> to it; the detections itself is crucial as it avoids the unnecessary 
>> rexmits even if the sender decides not to implement the response that may 
>> revert the cwnd back to its previous value and thereby helps the flow 
>> itself.
>> 
>>> I suspect there may be a misunderstanding here, that there is a need to
>>> instantaneously send out a pure ACK right after receiving a CE-marked
>>> ACK. Which there is not - normal delayed ACK, and minimum ACE counter
>>> change heuristics are there to prevent this particular case.
>> 
>> That was not the case as you can now see.
>> 
>>> While we couldn't come up with any example of the ACK-on-CE-marked-ACK
>>> causing misbehavior (note that these AccECN Acks do carry changed
>>> information, the new, incremented ACE counter in order to keep both
>>> sides in sync there), the text suggesting for the sender not also
>>> implementing either SACK or Timestamps was added as a safety measure.
>> 
>> Because there is significant harm that Acks of Acks will cause to all algos 
>> that trust on that TCP implementations honor RFC 5681 and send a dupAck 
>> only when a data packet has left the network, it is crucial that Acks of 
>> Acks MUST NOT be generated when SACK is not in use.
>
> [RS] SACK or TSopt; Agreed.
>
>> This can be forced either by prohibiting ECT-marked pure Acks or by 
>> prohibiting Acking of pure Acks. Currently the draft says "SHOULD NOT send 
>> ECN-capable pure ACKs". Why "SHOULD NOT", instead of "MUST NOT"? What might 
>> be the valid reasons when pure Acks would be allowed to be ECT-marked, 
>> resulting in Acks of Acks and violation of RFC 5681 and all problems 
>> discussed above and more (which were not discussed, e.g., PipeAck 
>> calculation in RFC 7661 and calculating DeliveredData in PRR, etc.)?
>
> [RS] MUST NOT is certainly more appropriate, when neither SACK or TS are in 
> use.

[MK2] MUST NOT if SACK not in use, agreed.

>> The same holds even if timestamps are in use unless someone specifies and 
>> shows how timestamps can reliably be used to distinquish "fake" dupAcks 
>> (Acks of Acks) from real dupAcks without breaking any of the existing 
>> mechanisms that rely on timestamps.
>
> [RS] See above; pure ACKs will update ts.recent (TSecr) if received after all 
> data is received in-sequence, thereby allowing the distinction between the 
> true DupAck and CE-counter update cases.
>
>> When SACK is in use and pure Acks are marked ECT, it seems possible to 
>> distinguish "fake" dupAcks from real dupAcks. However, one must first 
>> specify how pure Acks are acknowledged. This seems to require modification 
>> to RFC 2018 (as I already explained in my reply to Bob but seemingly need 
>> to clarify as Bob didn't get the problem, pls see my new reply to Bob).
>> 
>> 
>> Are Acks of Acks useful?
>> ========================
>> 
>> Last, let's consider whether acking pure acks is useful, and if so, when. 
>> As an important principle when deciding whether to inject packets into the 
>> network one should use the robustness principle: ""be conservative in what 
>> you send ...". That is, one should carefully consider whether the packets 
>> are really needed; are they useful, and if so, when, or are they possibly 
>> unnecessary, and if so, when.
>> 
>> We can start with the example given in the AccECN draft (-24), Sec 
>> 5.2.2.5.1:
>>
>>   "In a unidirectional data scenario from host A to B where both hosts
>>   support AccECN, if the Data Receiver (B) has chosen to use ECN-
>>   capable pure ACKs [I-D.ietf-tcpm-generalized-ecn] and enough of these
>>   ACKs become CE-marked, then the 'Increment-Triggered ACKs' rule
>>   ensures that the Data Sender (A) gives B sufficient feedback about
>>   this congestion. Normally, the Data Sender (A) can piggyback that
>>   feedback on its data. But if A stops sending data, the second part of
>>   the 'Increment-Triggered ACKs' rule ensures that A emits a pure ACK
>>   for at least every third CE-marked incoming ACK over the subsequent
>>   round trip."
>> 
>> The above describes unidirectional data scenario, which is quite typical 
>> and claims that this ensures sufficient feedback [on CE-marked ACKs]. Sure, 
>> it does. But is this necessary?  Obviously, in this case the feedback can 
>> be used only for Ack CC because B is not sending anything.
>> However, B cannot apply Ack CC until A sends new data, so it is enough that 
>> the feedback is delivered with next data segment(s) and sending it 
>> separately in Acks of Acks earlier is absolutely unnecessary, just 
>> resulting in additional packet load in the network.
>
> [RS] You miss the case, where B may choose to start sending data itself - and 
> can incorporate the updated path information into the congestion state, 
> *prior* of starting to send that new data.

[MK2] Well, I am not. I am commenting a scenario in the present draft that 
explicitly indicates that it is "In a unidirectional data scenario", i.e., 
when B is not expected to start sending data (that is what unidirectional 
means).

[MK2] My comments for the case where B may choose to start sending data 
were here right below, related to the next para of the present draft that 
discusses such a scenario.

> [RS] But in the pure unidirectional case, yes, AckCC would currently be one 
> potential future use case.
>
>> 
>> The draft continues:
>>
>>   "... in this case it is mandatory for A to emit ACKs of ACKs because they
>>    feed back new congestion state (useful in case B starts sending)."
>> 
>> As explained above, clearly it is not mandatory for A to emit ACKs of ACKs. 
>> But, as the draft correctly says, it is useful [only] in case B starts 
>> sending, but not always (e.g., if the TCP conncetion in half-open, B never 
>> can start sending).
>> 
>> Even if B starts sending, it is not necessarily useful (or practical).
>> Let's now look at the Fig 2 of my scenarios, where apps on A and B use very 
>> common request-reply type of communication: in this case A sends very short 
>> requests (maybe a few bytes only) and B replies with long replies. And this 
>> is continuous, interactive communication with quite strict delay 
>> requirements such that it is important that A is able to send the next 
>> request immediately after the previous reply from B. B has increased its 
>> cwnd such that it may send all packets of reply in one RTT
>> (B is application limited, but RFC 7661 or RFC 5681 would allow it to 
>> continue with the same cwnd from RTT to another as long as there is no 
>> congestion from B to A).
>> 
>> A sends the request (data 100) and B replies with n data segments, 
>> resulting in approx. n/2 cumulative Acks. The path from A to B is congested 
>> (by other traffic) such that many Acks get CE-marked, resulting in up to 
>> n/6 Acks of Acks that preceed the next round of data from B to A.
>> This is continuous meaning that up to 16+% of packets from B to A are Acks 
>> of Acks if the path from A to B remains congested, which is possible if 
>> there is a large number of competing flows (even if they react to 
>> congestion). The figure is missing the later rounds of possible ping-pong 
>> Acks of Acks that may add to the Ack os Ack of the next round(s). Having 
>> these later, dampening ping-pong Acks requires, of course, that path from B 
>> to A is also congested. For simplicity, let's also assume that Acks of Acks 
>> are somehow prevented from being interpreted as dupAcks, that is, they do a 
>> not trigger false Fast Retransmits (and are not illustriated in the Fig 
>> even though they would cause additional problems also in this scenario if 
>> interpreted as dupAcks).
>> 
>> Now, let's consider usefulness of the Acks of Acks carrying congestion 
>> feedback. When A receives the feedback it may apply either Ack CC to reduce 
>> cumulative ack rate or decrease cwnd to reduce the data rate from A to B. 
>> If A uses the feedback for Ack CC purposes only (e.g., it learns via AccECN 
>> option that ack pkts only are contributing to the congestion), sending 
>> separate Acks of Acks is quite unnecessary as the feedback can be timely 
>> delivered with the data packets from B to A. If A decides to react by 
>> reducing its data rate, the early feedback with Acks of Acks might be 
>> useful. However, it might also result in undesired effects, maybe because A 
>> cannot distinguish whether it is its Ack or data that is contributing to 
>> the congestion when AckECN option is not in use. If A decides to reduce its 
>> data rate and cwnd = 1 MSS, the only way to do it is to delay the next data 
>> pkt which would result in undesired delay in delivering the next 
>> delay-sensitive request even though the
>> data rate from A to B was not contributing to the congestion (much) at all. 
>> So, if Ack congestion is automatically reflected back to A (the data 
>> receiver) but data vs. Ack congestion is not separated, A as a data sender 
>> may have a hard dilemma to solve. Furthermore, if A is sending such small 
>> reguests (<= 1 MSS) and is practically application limited, its cwnd is 
>> likely to be larger than 1 MSS. If it decides to reduce cwnd due to Acks of 
>> Acks reporting (Ack) congestion, its is likely to have a null effect. That 
>> is, it seems that Ack congestion is best reacted by Ack CC reducing 
>> (cummulative) Ack rate only.
>> 
>> One additional example where reflecting Ack congestion blindly in Ack of 
>> Acks is quite unnecessary is when there is a hole in the sequence space.
>> Assume A is sending data to B and a pkt gets dropped. This results in B 
>> doubling its ack rate for one RTT once it starts injecting dupAcks due to 
>> arriving out-of-order data. Now, if path from B to A is already congested 
>> or becomes congested due to increased ack rate, some (or many) of the 
>> dupAcks get CE-marked. That results in congestion feedback in Acks of Acks 
>> from A to B but B should not react to this added congestion feedback, 
>> because a pkt was lost and A will react by decreasing its data rate that 
>> will decrease future ack rate. In addition, the dupAcks were delivered only 
>> temporarily at the double rate and B will automatically stop injecting 
>> dupAcks and thereby first halves its Ack rate in one RTT and then returns 
>> back to sending cumulative acks with a reduced rate from what it was 
>> sending before the pkt drop was detected. Hence, also in this case Acks of 
>> Acks reporting congestion feedback are completely unnecessary.
>> 
>> Considering what I said above, in case one needs Acks of Acks, wouldn't it 
>> be much more effective to dampen ping-pongs, if one specifies that Ack of 
>> an Ack reporting CE on a pure Ack can only be sent if the pure Ack advanced 
>> SND.UNA, i.e., only count and report marked bytes/pkts in cumulative Acks. 
>> I don't find any reason to report CE-marked dupAcks (like in the previous 
>> exampel above) because the pkt load they represent is temporary extra load 
>> that has dissapeared by the time the feedback arrives at the other end. So, 
>> they are not subject to be CC reacted as the system has already reacted and 
>> removed that part of load. Or maybe I am missing something?
>
>
> [RS] This is a very valuable discussion to have. However, it appears to me, 
> that this is not something in scope for the signalling schema to solve, but 
> rather higher level mechanisms such as AckCC, Congestion Response etc; one 
> insight to take away here is, that a host having negotiated AccECN and also 
> doing ECN++ and SACK/TS, will need to be able to selectively toggle 
> ECT-capable ACKs for extreme situations where the additional load of 
> AccECN-stipulated ACKs-for-CE-ACKs itself becomes a significant source of 
> congestion. However, I want to repeat that this is not something we have to 
> address in the signaling protocol itself, and can be worked out e.g. in the 
> ECN++ or AckCC experiments.

[MK2] Sure, it sounds like a good idea if one may selectively toggle 
ECT-capable ACKs to reduce the ping-pong effect but it seems to me that 
it is neither reasonable nor effective way to the ristrict Acks of Acks 
by selectively toggling ECT-capability of Acks. In such an approach one 
is still often sending the first round of Acks of Acks quite 
unnecessarily with the potential drawbacks of Acks of Acks.

>> 
>> So, the only case in which I can identify Acks of Acks potentially being 
>> useful is a slightly modified scenario of my Fig 2 where A sends a larger 
>> request, i.e., a number of data pkts, instead of just one.
>> Even in this case the Acks of Acks tend to arrive too late and can be used 
>> only for controlling data rate in the next round when the congestion 
>> information might already be stale. Hence, maybe the only scenario where 
>> Acks of Acks may be useful is when the interaction between A and B is not 
>> delay sensitive and A waits for a while after the data segments of the 
>> reply from B have arrived before it starts sending its next request? 
>> Otherwise, Acks of Acks seem to serve Ack congestion control only which is 
>> an open issues and definitely not mature for stds track at this time.
>> 
>> To summarise:
>> =============
>> 
>> It seems that Acks of Acks are mostly unnecessary and do not help much in 
>> terms of providing more timely feedback. Therefore, it would be useful to 
>> carefully consider whether it is reasonable to nail down in a stds track 
>> document a behavior (blindly Acking Acks for CE feedback) that in most 
>> cases seems injecting additional packets to the network unnecessarily.
>
>
> [RS] And an implementer is not forced to send out ECT-capable ACKs blindly;

[MK2] I think you misunderstood: I didn't say ECT-capable ACKs are sent 
blindly but Acks of pure Acks are sent blindly (an automaton) which is 
very different.

> [RS cont'ed] However, in order to have a consistent and future proof 
> capability in the 
> receivers, a consistent and timely reflection of the CE marks (and marked 
> bytes) seems good to have; to reiterate - there are examples where signalling 
> protocols designed for one purpose have shown unexpectedly useful (or less 
> so), for dramatically different purposes later on (e.g. Eifel with TSopt, as 
> one of many). It is easy to mitigate all the above objections by unilateral 
> action by the sender (not send ECT-enabled pure ACKs). However, enabling this 
> feedback if it turns out to be valuable (e.g. retaining CC state while the 
> half-connection is idle in a more appropriate state) would be much harder, if 
> the capability does not exist in a dependable fashion the the receivers (CE 
> reflectors).

[MK2] I understand the idea here but as I have explained I see more 
potential drawbacks and caveats in doing it like this in a stds track doc 
than what I can see benefits (potentially beneficial in a single scenario 
only but not shown to be sounds to me like quite limited justification to 
become a stds track feature).

Thanks,

/Markku

>> 
>> What specifically strikes me is that currently the draft mandates this 
>> behavior with a MUST in Sec 3.2.2.5.1, Increment-Triggered ACKs:
>>
>>   "An AccECN Data Receiver MUST emit an ACK if 'n' CE marks have arrived
>>   since the previous ACK."
>> 
>> Why cannot this be SHOULD?
>
> [RS] See above; consistent "reflector" / receiver behavior is much more 
> valuable for future proofing a signalling protocol, than making aspects of 
> this optional - forcing a future sender to deduce the probable behavior of a 
> particular receiver...
>
>> 
>> This later part of the rule seems to serve ack congestion control only as 
>> discussed above:
>>
>>   "If there is no newly delivered data to acknowledge, 'n' SHOULD be 3 and
>>   MUST be no less than 3. In either case, 'n' MUST be no greater than 7"
>> 
>> If specified like this and one wants to enable ECT-marked pure Acks, it 
>> mandates always conveying feedback on Ack congestion in Acks of Acks.
>> Also, when the data sender decides to do Ack CC just by reducing its data 
>> rate, which requires no feedback to data receiver. Instead, any feedback is 
>> likely to be mistreated and ack rate gets double controlled.
>> 
>> If the first MUST in Increment-Triggered ACKs rule is made SHOULD, then one 
>> could omit the later part of the rule. This would require reformulating the 
>> rule, I believe.
>> 
>> Why? An implementor may want to benefit from ECT-marked Acks but avoid 
>> sending unnecessary Acks of Acks.
>> That is, a valid reason for omitting SHOULD would be to separate Ack CC 
>> from data CC such that CE-marked data pkts would be reported as per this 
>> draft but the data sender could benefit from CE-marked Ack info and decide 
>> to apply Ack CC and report it separately or not report it at all to the 
>> other end. The latter would allow the data sender to control ack flow by 
>> controlling its data sending rate, reguiring no communication with the 
>> other end. In the former case, the data sender could use some other means 
>> to report Ack congestion to or control Ack congestion with the data 
>> receiver (e.g., Ack Rate option).
>> 
>> I am not claiming that this would be somehow better approach but given that 
>> Ack CC is still open research or at least experimental, I think we should 
>> not nail down and restrict how to do it in this stds track document to be.
>> 
>>> But if we have not considered a particular corner case, please let us
>>> discuss that!
>> 
>> AFAIK, none of the cases I described is a corner case but likely to occur 
>> often in very typical application use cases.
>> 
>
[tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-23 tuexen
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… Markku Kojo
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… Bob Briscoe
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… rs.ietf
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… rs.ietf
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… Bob Briscoe
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… Bob Briscoe
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… Markku Kojo
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… rs.ietf
Re: [tcpm] WGLC for draft-ietf-tcpm-accurate-ecn-… Markku Kojo