Re: [tcpm] [tsvwg] New Version Notification for draft-grimes-tcpm-tcpsce-00.txt

"Scheffenegger, Richard" <> Fri, 26 July 2019 13:06 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9414112021D for <>; Fri, 26 Jul 2019 06:06:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.598
X-Spam-Status: No, score=-2.598 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id SM9Yns2G-pVK for <>; Fri, 26 Jul 2019 06:06:32 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 68C6F12012B for <>; Fri, 26 Jul 2019 06:06:32 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=badeba3b8450; t=1564146390; bh=XpkHrlKGTgYnsoB3CzbrH935kDwhD4/BzaDXsU0LFhE=; h=X-UI-Sender-Class:Subject:References:To:From:Date:In-Reply-To; b=EFJxRf1z1fBC3RMXLgMMds+YvEf8VwGwagil92M+GYizJnhsivSoCxhRGjwi6hChb Z4d8GxlZMNqgtiSoy4bLHHCJVKHRK1hdoiwtSz8MUjJ7SZEJRVJhHiPVlAZpObT83O eGNlhErViraZlAZfxNsCOfrVnUTBhXg5GdsgkwM0=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [] ([]) by (mrgmx101 []) with ESMTPSA (Nemesis) id 0MWgND-1hsnCX2Fsx-00Xr94 for <>; Fri, 26 Jul 2019 15:06:30 +0200
References: <>
To: "" <>
From: "Scheffenegger, Richard" <>
X-Forwarded-Message-Id: <>
Message-ID: <>
Date: Fri, 26 Jul 2019 09:06:31 -0400
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:WWki5i8cyBpvwEyYJxejY95nHzv1t/ps6yIk0ud39bDdZ6aULwE 4RhZzScsyOly/nQxI56GPlh1cET+cBClyHEVlN4FSEizZYleke3M2jH/P8/saFPF5IFH3Ij 9tribNBJsaxJlKJ0CJICl6lY2nznK4s2DkyXw1yVfGzWoCXabGg3PzcZhpKyMEZQ4D1BHNm U/Vm6SQoJ8hE8Ec3zku+w==
X-UI-Out-Filterresults: notjunk:1;V03:K0:027EeF6ESZI=:+kVjbjBnYyr2omXup7CGU+ PQxE8aqE9Mm4HF0eSouznZPWVWQi0t3ymI2u4ZZZYQqGxNhS/vbjHNGlttNwRuXav0+spsHZn aelhZFmu+V9O5otuWmaA4cFcUDuV6WNfYb5RPPyjkF2jLjXNecLP09zuA9j0oWy4f+ffX6K9r 0tMQuCGKaqz0o/jY481+N5uBSjO4I0kX7jqn1F3mzJypnOLbgh6IkjOPN5FPVwQMU6s+ZwG4T etWbtLBhpmYpUknXXEqu2b4ZKLSfXvBPHay0s2mVS4NsHx6Xt0wLrzl6u4wMKahD0bI05rd+g S28NZeWVYSlxmCmhu57wlVYK+GXBdh8I/s+j7+PXNA2Td3EvBTSqIIwgSmtie8ain5iG30emM TqoJEcgp5JyamdWZl+D2cLjWseY0mkrO/JqiAuWfXk9M70oy5qw04wzkgxfESQOiz5PCXBDtk GlJvRJByoY4K7G/9WM8EaEO4LilbHK5kQedGugZjQw2f/C9lPiC2DQPKET2NLSnFo4YkxwoeX bCszjkdy1rW3XQtvFqBVqaEuUhMNbHdgikn5p0IWsxmAVpu8FV715T8OSkN5dqv0ZFKosF5kl 08Hvc2hycpteLasMvkkVLQP8amg6AIYojKwRaL9Ogqc79DRkUChV8G1lYrWbtMu9O2hIITdas Pm8InigmoMeNpwRsA2Coo9MY6eHJUX1ZVGqNfwIzDZUaV5SPfhMSne+2R+W1o99G9s8G847U4 fSgPOeEFqOVyyE+Op9BzC93/cSq90Z4SVF9uFZ0Kzjv4v/LQ6+1Lb9TD+tNZv2M3A5lid6MOi kk7gEUGszkacKd5aQ6nEGD80ESWVWnTmvUcNqhNnDgHvTcbkrsNFpxFZzFKbegzgoXV11TDnW UEYzbMvZlHTLhcE0PAzK0Y2b97KxT421L/MAXuMHlvXMaJNauU8rPJ6h92uaX1BCeHVu5pKNO QpoTAo9G2IincnMRFGsIeBh6oXaFrJNVnwomhVntz7Rgtcj5TmDEKgJuVIzSdbbvNQOL00ow/ egcBrmyyjCaFFZ66iTr8oZfjn6J2e+6dJGqkoaH8ksKMGr3XDMhhHzn9tct/kBLScnfLcqsh7 YSW0BR9t1+z6EeFjxr7GSe0XO/URiXTIC6u
Archived-At: <>
Subject: Re: [tcpm] [tsvwg] New Version Notification for draft-grimes-tcpm-tcpsce-00.txt
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 26 Jul 2019 13:06:34 -0000


tl;dnr: I would like to have a discussion around the architecture of SCE
first, before talking about the specific details of the signaling
scheme. Nevertheless, I try to (again, after Bob) raise some concerns
around the currently proposed scheme nevertheless.

To chime in, I would like to take a step back in this discussion and
understand the fundamental issues trying to be solved here.

I have not yet fully understood the rationale for SCE, requiring two
distinct, different CE signals.

To me, it appears that the implicit design choice of SCE was for a
router to add a "SCE" signal without any classification, if a particular
flow does actually support SCE or not.

That way, downstream routers from the SCE-enabled router can still add
additional CE information - or the SCE router can give a signal of
higher magnitude, if the endpoints ignored SCE - as would be initially
be the norm.

In comparison, the SCE signal appears to be functionally equivalent with
the CE signal in the full L4S architecture (ECN++, ECT1-marked, AccECN
feedback, separate Queues in routers) - where the end points expicitly
inform the network of this change in the semantic of the CE signal, by
virtue of an expicit signal (ECT1, or a DiffServ Codepoint, as currently
being discussed).

Effectively, in the L4S architecture, an individual CE mark carries much
less "weight" than a 3168 CE mark - but a router is explicitly informed,
which of the two "weights" a particular flow would work with.

Now, some observations around the currently proposed SCE feedback scheme:

I've read through both

Regarding the SCE signaling, my main feedback would be that the
immediate ACKing when the receiver observes a ECT1 (or a change of state
from CE to non-CE in the case of DCTCP) is brittle. It also "ossifies"
the average proportion of ACKs vs data - while offload NICs try to get
away with as few ACKs as possible (e.g. one per TSO/LRO/GRO superframe,
which may be some 10s of packets - also Ack thinning, Ack compression,
Ack loss).

Early on during AccECN, we had a signaling strategy to reflect two
codepoint-counters back for both ECT1 and CE, which was already pointed
out by Bob.

The idea was to split the 8 possible codepoints in a way to allow more
reliable feedback without a 3168-like handshake method. As CE was deemed
more important than ECT1, we had 3 codepoints assigned to the ECT1
counter (to signal a modulo 3 count of the number of received ECT1
marks) and 5 codepoints to a CE counter (modulo 5). A counter change
would be reflected by inserting the associated codepoint into the next
returned ACK. Otherwise, the two counters would be signaled
alternatively (one ACK - however many data segments apart - would have
the CE-related codepoint, the next ACK the ECT1 related codepoint). If
both counters changed in between ACKs, CE would take precedence.

During simulation of loss patterns (ack thinning), that approach showed
dramatically better capability to reliably signal both counters back,
except under the most extreme conditions. Obviously, different
assignments of the codepoints would also be possible (e.g. 4 codepoints
to CE, and 4 to ECT1; or even 3 CE, 3 ECT1, 2 ECT0...) depending on the

Again, the main benefit under steady state would be to allow some ACK
thinning or longer gaps in delayed ACKs (while also allowing some ACK
loss as well), making the feedback signal path much less brittle...

And I also want to point out, that with the current AccECN draft, the
ship has not yet sailed in the sense that we no longer can do any
alternative experiments with a different feedback schema.

The AccECN draft clearly states, that future extension is possible. On
the <SYN>, only the combinations of <0,0,0> (no-ecn), <0,1,1> (3168
ecn), and <1,1,1> (accecn) are explicitly used. A variant of the AccECN
handshake can be used, to negotiate for a different receiver-side
behavior - while a receiver conforming to the current accecn draft would
at least provide reliabe CE information.

I will look into the scenarios of SCE (which IMHO should be divorced
from a specific feedback schema - it's really an additional, less
dramatic signal from the network about congestion, than 3168 CE.

On a high level, the observation that a ramped probability marking
provides some benefits over a step-change probability matches what has
been reported before from other researchers.

Or to summarize: The step-change in marking from 0% to 100% at a
specific threshold of instantaneous queue depth, as described in DCTCP,
is not the most effective way...

Am 26.07.2019 um 04:54 schrieb Rodney W. Grimes:
> Bob,
> Responding to specifics of draft-grimes-tcpm-tcpsce-00.txt
> conversation in this thread as its author.
>> Jonathan,
>> On 25/07/2019 14:53, Jonathan Morton wrote:
>>>> On 25 Jul, 2019, at 2:20 pm, Bob Briscoe <> wrote:
>>>> The idea was to have a generic wire protocol with a dumb receiver, so that the same feedback protocol could support multiple needs for feedback by different TCP congestion control algorithms.
>>>> So a fairly inefficient re-use of the 'NS' TCP header flag for one particular experiment is very unlikely to fly, particularly when the experiment it supports doesn't satisfy all the requirements in 7560.
>>> AccECN burns the same bit to provide higher fidelity feedback of CE, without addressing our need to feed back the distinction between ECT(1) and ECT(0) at all (unless the TCP Option is used).  Since higher fidelity feedback of CE is not useful for SCE, using NS in this way is actually more efficient for us.
>> As you say, AccECN does support SCE - with the option. That was because
>> it was generally agreed that accuracy of the CE signal was paramount,
>> rather than trying to do two things with 3 bits and under-achieving for
>> both.
> It supports TCPSCE only in the sense that once AccECN TWH fails
> to find a L4S endpoint it does fall back to ECN as defined in
> RFC3168 which TCPSCE builds upon by reuse of the NS bit which
> was freed up by RFC8311 (to provide a feedback path for the
> higher fadelity SCE forward mark.)
> Further more by using this fall back position it means that
> none of the remaining AccENC TWH code points that exist today
> would need to be consumed in an attempt to negotiate this use.
> Given that many of the 8 code points created by doing the L4S
> TWH modifications are already consumed by L4S and that if you
> do the modified L4S handshake you gain access to all of the
> ECN TCP layer bits I feel it would be wasteful to consume that
> also limited resource for a single bit access to NS vs a possible
> future 3 bit access to (NS, ECE, CWR)
>> We did have another scheme with 5 & 3 codepoints in the 3 bits for two
>> counters - look back over the draft history. But the WG decided
>> simplicity was also important in a CC protocol, cos lack of bugs is also
>> important.
> Intresting end phrase "lack of bugs is also important", I shall simply
> state that SCE and SCETCP as running code is a significantly small amount
> of code and the fact that a prototype working implementation was completed
> in ~24 man hours speaks to its simplicty and elagance of design.
> Though I would not qualify that as Internet deployable code, it was
> adaquate such that we could conduct experiments and debug the IP
> layer SCE marking code being implemented in a Linux versioned middle box.
>>>    Happily, AccECN and SCE can coexist on different flows, thanks to the fact that AccECN does have a negotiation phase which SCE can naturally reject.
>> You will find there is resistance to starting to use the NS flag after
>> having negotiated RFC3168 ECN feedback, but without any explicit
>> negotiation. You will probably be asked how a future experiment would be
>> able to use the NS flag if your experiment fails (assuming it is adopted
>> as IETF work in the first place). You are walking into a world of bit
>> scarcity.
> See above, we are leaving the remaining AccECN TCP TWH code points
> open for future override of NS/ACE, ECE, and CWR rather than consume
> the code points left in that TWH to consome 1 bit in what is effectely
> an unused code point in the TWH (fall back to RFC3168 ECN).
>>>> For instance, I think the reason the tcpsce draft discusses multiple ways of doing the feedback is that, in the presence of pure ACK loss (which is often due to deliberate ACK thinning), none of the three solutions preserve reliable delivery of the ACK signal.
>>> In SCE, *reliable* feedback of SCE signals is not actually required, both because the control loop is naturally stable, and because RFC-3168's CE feedback *is* reliable and thus offers a safe fallback.  Of course we understand that the design constraints for L4S' feedback mechanism were different.
>> My point wasn't so much about safety, it was about about accuracy and
>> particularly biased error. The essence of low latency congestion control
>> is keeping the queue low without losing utilization, which requires
>> accuracy. If your feedback has a biased but unknown error, it's really
>> hard to accurately converging towards a target.
> We have found no issues in our testing thus far that would indicated
> that either of the (small) error amount or bias direction has a significant
> impact on the convergence rate or target of the closed loop system.
> Including doing a random drop of ack packet test.  I can agree that the
> higher the accuracy of this feedback the better, but the solution is robust
> as designed.  Much as you do not need super high accuracy input to
> a steering wheel to keep a car centered in its curving lane.
> I would not qualify this as "really hard to accurately converge",
> if it was most closed loop control loop systems would fail.  If
> on the other hand it was open loop then the accuracy would need
> to be near perfect, is that the issue your trying to raise?
>>> Ack thinning is also something we have explicitly considered, given that Cake includes an optional ack-filter which does exactly that.  (We have, for example, added consideration of the NS bit to Cake's ack-filter, which was a trivial patch.)  Mathematically, the most extreme errors possible in either direction, due to ack thinning, are easily corrected during subsequent RTTs.
>> Exactly - if your feedback has a biased error, you will usually miss the
>> target and try again in the next round. No point continuing this
>> discussion - I'm just saying you'll need to see what happens in reality
>> when the feedback is thinned.
> We are aware of the ack thinning concerns, and do plan to fully evaluate
> that as a potential problem, though to date our experiments indicate it
> is minimally impacting.  There is also all the discussion going on about
> maybe ack thinning is not such a grand idea on its own merits.
>>>    - Jonathan Morton
>> Bob Briscoe