[tcpm] Tech review of draft-grimes-tcpm-tcpsce-01.txt

Bob Briscoe <ietf@bobbriscoe.net> Fri, 15 November 2019 17:59 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 908EC120058; Fri, 15 Nov 2019 09:59:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4abOprzdERnT; Fri, 15 Nov 2019 09:59:39 -0800 (PST)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AE059120013; Fri, 15 Nov 2019 09:59:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=3uJlRegk2pDDBKv+z7JCxnLRmV37CT4mqUGOU0CbQ1M=; b=WZCJrRScSrZup5eyIsqLGn3QF 7Ltwuy5f/kwdMj7aygzNttUIjjkorpz4S2sZP43IHETcSg587iCzTBVcolvk2GRIX7ypVgw7Gqut1 Nb4uMg8kA9U79Cpzx5gya6hcQuKqC5XO2Zyk7C44lYaNd7QmtzfxL82Bqxyr1ebUtd8KnkjOb0/kO FxV+x/6mz4/qtDAGBNgRsFu1+LJMT2ywFyTRCiHrpavxoYr4dHniJGDjWKWEGD0zP8cnb9SHwvLal 9otS+uV8ws7iSWgpm5tMI8E1nryE2/aUc7d78JgWSIDQx2Alj0ZMBo2DiqZOVhPcRkLA9BAIgT5ww 0RqPSqlSQ==;
Received: from [182.55.86.154] (port=38160 helo=[192.168.0.142]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>) id 1iVfsZ-00030P-G5; Fri, 15 Nov 2019 17:59:32 +0000
To: rgrimes@freebsd.org, tcpm@ietf.org
Cc: tsvwg@ietf.org, draft-grimes-tcpm-tcpsce@ietf.org
References: <201911042204.xA4M4Zob002799@gndrsh.dnsmgr.net>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <8001dae3-0873-5ba1-d603-a9d8661b41ed@bobbriscoe.net>
Date: Fri, 15 Nov 2019 17:59:28 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <201911042204.xA4M4Zob002799@gndrsh.dnsmgr.net>
Content-Type: multipart/alternative; boundary="------------6D748A2B4D41319AF2CB5D5E"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/Mbdoh-GZCO_fc6e-84ORCKk75-w>
Subject: [tcpm] Tech review of draft-grimes-tcpm-tcpsce-01.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Nov 2019 17:59:45 -0000

Rod, Pete,

This is a review of the TCP ECN feedback schemes in 
draft-grimes-tcpm-tcpsce-01, on its own terms - i.e. whether it does 
what it aims to do, rather than whether I agree with what it aims to do.

Please do not feel the need to answer this email quickly. It will take 
some working through, altho I have tried to summarize for the list as well.

===Terminology===
p% The instantaneous SCE marking probability
d% The delayed ACK ratio, e.g. d%=25% means the receiver ACKs at least 1 
in 4 data packets.

===Assumption===
SCE is intended to sometimes or even often cause high levels of SCE 
marking, particularly when using DCTCP congestion control, when runs 
approaching 100% SCE can be common for a round trip or so, and certainly 
 >50%.

===Exec Summary===
Why go through all the years of pain of altering the TCP wire protocol 
for an accurate feedback scheme that isn't accurate?

There are 4 schemes proposed, one of which is NOT RECOMMENDED by the 
authors themselves. Of the other three:

  * 'Simple' multiplies up the Delayed ACK ratio the more the marking
    level is above the delayACK ratio (p>d), which rules it out on
    safety grounds.
  * 'Dithered' doesn't meet its own stated goal (apparently by definition).
  * 'Advanced' increases the ACK ratio in the mid-range of SCE marking
    levels. However, not as badly as 'Simple', and similar to DCTCP or
    AccECN. However, in this range...
  * ...unlike DCTCP and AccECN, 'Advanced' can toggle between long
    periods of 0% and long periods of 100% feedback, when subjected to
    ACK thinning, or segmentation offload.
  * All three schemes suffer from more and more bias in their error as
    the SCE marking level increases. It is important to eliminate bias,
    because it causes a congestion control scheme to aim for the wrong
    rate and therefore either causes excessive queuing delay, or
    excessive underutilization.

To summarize, in all cases:

	Meets SCE goal#1 in clean network? 	Maintains DelACK ratio? 	Unbiased 
result when ACK thinned?
Simple 	Y
	N
	N
Dithered 	N
	Y
	N
Advanced 	N
	N~
	N

Where:
'Y' means OK.
'N' means Not OK.
'N~' means not too bad.

No scheme meets more than one of these three show-stopper requirements.  
It was recognized (see RFC7560) that compromises would probably be 
needed because of the limited header space. But in all three cases, SCE 
makes compromises that are show-stoppers.

===Divergent Goals===
 From the email discussion so far, I am told that the goal given in the 
'Dithered' subsection is the goal for all schemes:

    The goal is to have the same number of bytes marked
    with ESCE as arrived with SCE.

(Ironically, dithered never meets this goal, apparently by design)

I'll call this Goal #1.

Actually, as we shall see, each scheme has a different goal for how much 
marking the receiver feeds into the ESCE ACK stream:

  * 'Simple' meets the above goal,
  * 'Dithered' aims for d% of the above goal.
  * 'Advanced' needs to be decoded, rather than being based on the
    amount of marking.

Different receivers seem to be allowed to use different schemes or the 
same scheme with a different parameter (e.g. different values of d% in 
the Dithered case). Each will give different meanings to an ESCE ACK in 
different flows.

What if the sender decodes an ACK stream believing it to be the 
'Advanced' coding, when it is actually 'Dithered' at d%=20%?


===Scheme-by-scheme review===

==4.2 Simple==

    Upon receipt of a packet with an SCE
    codepoint immediate ACK processing SHOULD be done

'Simple' meets Goal #1, but at the expense of causing an ACK ratio of 
max(d%, p%), which is unsafe IMO. For instance, at 75% SCE marking 
(which is not uncommon with a DCTCP congestion control), 75% of data 
packets trigger an ACK. Such a large change in ACK ratio brings a 
significant risk that congestion from the forward path will propagate to 
the reverse path.

If subjected to ACK thinning, the more SCE marking rises, the more 
'Simple' under-reports. This is because the proportion of ACKs that are 
ESCE marked rises as the marking level rises. So if a proportion of ACKs 
is thinned blindly, proportionately more ESCE ACKs will be thinned.

==4.3 Dithered==
'Dithered' isn't even defined to meet the goal stated in its own 
section. It is defined to factor down the ESCE ACKs to d_r% of the 
proportion of SCE-marked data packets (i.e. d_r% of p%, where 'r' stands 
for receiver).

As introduced earlier, the sender can't even work out the receiver's 
d_r% heuristically. It can see what ratio of ACKs arrive, d_s% (s for 
sender). But the sender cannot know what d_r% was originally, because it 
doesn't know how much d_r% might have been reduced by ACK thinning, 
either by a middlebox or a NIC.

Actually, the way 'Dithered' is described reveals that this section was 
written after some arm-waving but before any thinking. More on 
'Dithered' later.

==4.4 Advanced==
Advanced meets the goal. But only if no ACKs are lost (and they all 
arrive in order). Its accuracy is sensitive to ACK loss (or reordering), 
because it's a stateful algo. Also, I show below that, as the level of 
SCE marking rises, it will quickly become completely broken in the 
presence of simple ACK thinning.

Let's try an example, with d% = 25%, by typing out a finger-random 
sequence of SCE marks.
     01001011000000101000010111001001....
For this sequence, p% = 100*12/32 = 37.5% > d%.

The resulting ESCE ACK stream will be:
     01-010-1---0-0101---010--1-01-0 ....
That's 13 in 32 ~= 41%, compared to 25% for just DelACKs.

This is the same inflation of the ACK rate as (AccECN+TCP Option) or 
(DCTCP-style feedback). The tcpsce draft rather plays down the impact 
though:

    The ACK
    volume is then inflated only slightly compared to an unmarked
    connection


Unlike the SCE schemes, both AccECN and DCTCP can subsequently be 
thinned or coalesced without the error becoming biased. Here's how 
'Advanced' becomes biased.

Let's run the above sequence through a simple algo that thins every 
other ACK:
Before:
     01-010-1---0-0101---010--1-01-0 ....
Forwarded:
     0--0-0-----0--1-1----1---1--1-- ....
Thinned out:
      1--1--1-----0-0----0-0----0--0 ....

Oh dear. You can see what's happening. It thins out all the 1s for a 
while, then switches to thinning out all the zeros. Not only does this 
completely destroy the feedback signal, it does so with a toggling bias 
that will push the congestion control into large swings. This would 
happen for any algo that thins or coalesces an even proportion of ACKs.

So, the original input:
     01001011000000101000010111001001....
decodes as:
     00000000000011111111111111111   ....
I'd say that's pretty broken.


The problem here is that, for higher SCE marking levels, the Advanced 
scheme turns any random sequence into a regular sequence. And some 
thinning/coalescing algos remove information in a regular way.

4.5 ACK Thinning

    Mathematically, the most extreme errors possible in either direction,
    due to ack thinning, are easily corrected during subsequent RTTs.

Starting a sentence with 'mathematically' makes it sound impressive, but 
it's not if you haven't actually done the maths. Even if the error in a 
control signal is unbiased (on average too high as much as too low), 
subsequent RTTs only correct it in cases when the traffic load is in 
steady state. That's hardly ever the case on the Internet, so error will 
not self-correct.

If you are trying to control the /dynamics/ of traffic flows with a 
signal that might sometimes aim too low and sometimes too high, the 
traffic will swing around more wildly, causing more queuing delay and 
less utilization.

Worse, error introduced by thinning any of the SCE-TCP schemes not only 
introduces error, it introduces /biased/ error. As explained in each 
case above. In the case of 'Advanced' the bias toggles at random points. 
Then if it's already aiming towards too much delay (because the error is 
under-reporting), half the time it might swing towards even more delay 
and half the time towards less.

In summary, why go to all the bother of introducing a high-fidelity SCE 
signal, then feeding it back inaccurately, and lop-sidedly?

==Dithered: How to Implement?==

    If some
    of the packets to be ACKed have SCE state set then some proportional
    number of ACK packets SHOULD be sent with the ESCE code point set.

Let's say d=50%. After 2 data packets, let's say one is SCE-marked, so 
we have to send half the next ACK packets with ESCE. But we only have 
one ACK for these two packets...

OK, we send half the next 2 ACK packets with ESCE. But then we've 
accumulated another two 2 data packets, with 0 1 or 2 SCE marks needing 
feed back. We need 2 more ACKs. But then we'll have between 0 and 4 more 
SCE marks to feed back. This will always snowball.

Hmmm. From its description, it looks like 'Dithered' wasn't thought 
through 'much'.

One could set each ESCE probabilistically, based on the fraction of 
marks since the last ACK. That would be noisy feedback, not accurate 
feedback though.

Instead, one could multiply using a repetitive add-subtract algo, to 
multiply the number of packets between SCE marks by d%, in order to 
count ACKs until the next ESCE feedback.

Even though that removes the noise of randomness, it's still a very 
inaccurate way to be accurate. For instance, for d%=25% and p%=1%, every 
100 packets would be SCE marked, so every 25 ACKs would be ESCE marked.

The problem is it's only communicating 1 bit per ACK. If instead it 
exploited n bits per ACK it would be able to maintain precision while 
sending ACKs 2^n times less frequently.

===Bigger Picture===
A new feedback scheme requires a change to the TCP wire protocol, which 
is one of the core Internet protocols. This requires change at both ends 
before it will work. Therefore, ideally, a new feedback protocol needs 
to be more generic than any one feedback scheme that might use aspects 
of the protocol.

That is why a requirements exercise [RFC7560] preceded AccECN, which 
took years in itself. The requirements admit that compromises will be 
needed. So they are more a checklist of things to consider. And you'll 
need a good reason, if you don't think you need to meet them.

There's also probably requirements missing from RFC7560, 'cos it's hard 
to think of requirements in the abstract, particularly subtle ones.

AccECN is close to becoming more frozen, but it's still open to change. 
You might want to think of other ways of repurposing it, while the 
opportunity to make tweaks is still open.


Bob

On 04/11/2019 22:04, Rodney W. Grimes wrote:
> All, Cross posting to tsvwg and tcpm due to overlap in areas covered,
>
> I have just posted an updated version of TCPSCE which hopefully addresses
> most of the feedback provided since the original publication.
>
> The document structure is unchanged, and the history tool works well
> to show what has been changed since the last revision.
>
> Thanks to those who have provided feedback, and assistance in updating
> this draft.
>
> Regards,
> Rod
>
>> A new version of I-D, draft-grimes-tcpm-tcpsce-01.txt
>> has been successfully submitted by Rodney W. Grimes and posted to the
>> IETF repository.
>>
>> Name:		draft-grimes-tcpm-tcpsce
>> Revision:	01
>> Title:		Some Congestion Experienced in TCP
>> Document date:	2019-11-04
>> Group:		Individual Submission
>> Pages:		6
>> URL:            https://www.ietf.org/internet-drafts/draft-grimes-tcpm-tcpsce-01.txt
>> Status:         https://datatracker.ietf.org/doc/draft-grimes-tcpm-tcpsce/
>> Htmlized:       https://tools.ietf.org/html/draft-grimes-tcpm-tcpsce-01
>> Htmlized:       https://datatracker.ietf.org/doc/html/draft-grimes-tcpm-tcpsce
>> Diff:           https://www.ietf.org/rfcdiff?url2=draft-grimes-tcpm-tcpsce-01
>>
>> Abstract:
>>     This memo classifies a TCP code point ESCE ("Echo Some Congestion
>>     Experienced") for use in feedback of IP code point SCE ("Some
>>     Congestion Experienced").
>>
>>                                                                                    
>>
>>

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/