[tcpm] Tech review of draft-grimes-tcpm-tcpsce-01.txt
Bob Briscoe <ietf@bobbriscoe.net> Fri, 15 November 2019 17:59 UTC
Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 908EC120058; Fri, 15 Nov 2019 09:59:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4abOprzdERnT; Fri, 15 Nov 2019 09:59:39 -0800 (PST)
Received: from server.dnsblock1.com (server.dnsblock1.com [85.13.236.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AE059120013; Fri, 15 Nov 2019 09:59:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=3uJlRegk2pDDBKv+z7JCxnLRmV37CT4mqUGOU0CbQ1M=; b=WZCJrRScSrZup5eyIsqLGn3QF 7Ltwuy5f/kwdMj7aygzNttUIjjkorpz4S2sZP43IHETcSg587iCzTBVcolvk2GRIX7ypVgw7Gqut1 Nb4uMg8kA9U79Cpzx5gya6hcQuKqC5XO2Zyk7C44lYaNd7QmtzfxL82Bqxyr1ebUtd8KnkjOb0/kO FxV+x/6mz4/qtDAGBNgRsFu1+LJMT2ywFyTRCiHrpavxoYr4dHniJGDjWKWEGD0zP8cnb9SHwvLal 9otS+uV8ws7iSWgpm5tMI8E1nryE2/aUc7d78JgWSIDQx2Alj0ZMBo2DiqZOVhPcRkLA9BAIgT5ww 0RqPSqlSQ==;
Received: from [182.55.86.154] (port=38160 helo=[192.168.0.142]) by server.dnsblock1.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <ietf@bobbriscoe.net>) id 1iVfsZ-00030P-G5; Fri, 15 Nov 2019 17:59:32 +0000
To: rgrimes@freebsd.org, tcpm@ietf.org
Cc: tsvwg@ietf.org, draft-grimes-tcpm-tcpsce@ietf.org
References: <201911042204.xA4M4Zob002799@gndrsh.dnsmgr.net>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <8001dae3-0873-5ba1-d603-a9d8661b41ed@bobbriscoe.net>
Date: Fri, 15 Nov 2019 17:59:28 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <201911042204.xA4M4Zob002799@gndrsh.dnsmgr.net>
Content-Type: multipart/alternative; boundary="------------6D748A2B4D41319AF2CB5D5E"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.dnsblock1.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: server.dnsblock1.com: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: server.dnsblock1.com: in@bobbriscoe.net
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/Mbdoh-GZCO_fc6e-84ORCKk75-w>
Subject: [tcpm] Tech review of draft-grimes-tcpm-tcpsce-01.txt
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Nov 2019 17:59:45 -0000
Rod, Pete, This is a review of the TCP ECN feedback schemes in draft-grimes-tcpm-tcpsce-01, on its own terms - i.e. whether it does what it aims to do, rather than whether I agree with what it aims to do. Please do not feel the need to answer this email quickly. It will take some working through, altho I have tried to summarize for the list as well. ===Terminology=== p% The instantaneous SCE marking probability d% The delayed ACK ratio, e.g. d%=25% means the receiver ACKs at least 1 in 4 data packets. ===Assumption=== SCE is intended to sometimes or even often cause high levels of SCE marking, particularly when using DCTCP congestion control, when runs approaching 100% SCE can be common for a round trip or so, and certainly >50%. ===Exec Summary=== Why go through all the years of pain of altering the TCP wire protocol for an accurate feedback scheme that isn't accurate? There are 4 schemes proposed, one of which is NOT RECOMMENDED by the authors themselves. Of the other three: * 'Simple' multiplies up the Delayed ACK ratio the more the marking level is above the delayACK ratio (p>d), which rules it out on safety grounds. * 'Dithered' doesn't meet its own stated goal (apparently by definition). * 'Advanced' increases the ACK ratio in the mid-range of SCE marking levels. However, not as badly as 'Simple', and similar to DCTCP or AccECN. However, in this range... * ...unlike DCTCP and AccECN, 'Advanced' can toggle between long periods of 0% and long periods of 100% feedback, when subjected to ACK thinning, or segmentation offload. * All three schemes suffer from more and more bias in their error as the SCE marking level increases. It is important to eliminate bias, because it causes a congestion control scheme to aim for the wrong rate and therefore either causes excessive queuing delay, or excessive underutilization. To summarize, in all cases: Meets SCE goal#1 in clean network? Maintains DelACK ratio? Unbiased result when ACK thinned? Simple Y N N Dithered N Y N Advanced N N~ N Where: 'Y' means OK. 'N' means Not OK. 'N~' means not too bad. No scheme meets more than one of these three show-stopper requirements. It was recognized (see RFC7560) that compromises would probably be needed because of the limited header space. But in all three cases, SCE makes compromises that are show-stoppers. ===Divergent Goals=== From the email discussion so far, I am told that the goal given in the 'Dithered' subsection is the goal for all schemes: The goal is to have the same number of bytes marked with ESCE as arrived with SCE. (Ironically, dithered never meets this goal, apparently by design) I'll call this Goal #1. Actually, as we shall see, each scheme has a different goal for how much marking the receiver feeds into the ESCE ACK stream: * 'Simple' meets the above goal, * 'Dithered' aims for d% of the above goal. * 'Advanced' needs to be decoded, rather than being based on the amount of marking. Different receivers seem to be allowed to use different schemes or the same scheme with a different parameter (e.g. different values of d% in the Dithered case). Each will give different meanings to an ESCE ACK in different flows. What if the sender decodes an ACK stream believing it to be the 'Advanced' coding, when it is actually 'Dithered' at d%=20%? ===Scheme-by-scheme review=== ==4.2 Simple== Upon receipt of a packet with an SCE codepoint immediate ACK processing SHOULD be done 'Simple' meets Goal #1, but at the expense of causing an ACK ratio of max(d%, p%), which is unsafe IMO. For instance, at 75% SCE marking (which is not uncommon with a DCTCP congestion control), 75% of data packets trigger an ACK. Such a large change in ACK ratio brings a significant risk that congestion from the forward path will propagate to the reverse path. If subjected to ACK thinning, the more SCE marking rises, the more 'Simple' under-reports. This is because the proportion of ACKs that are ESCE marked rises as the marking level rises. So if a proportion of ACKs is thinned blindly, proportionately more ESCE ACKs will be thinned. ==4.3 Dithered== 'Dithered' isn't even defined to meet the goal stated in its own section. It is defined to factor down the ESCE ACKs to d_r% of the proportion of SCE-marked data packets (i.e. d_r% of p%, where 'r' stands for receiver). As introduced earlier, the sender can't even work out the receiver's d_r% heuristically. It can see what ratio of ACKs arrive, d_s% (s for sender). But the sender cannot know what d_r% was originally, because it doesn't know how much d_r% might have been reduced by ACK thinning, either by a middlebox or a NIC. Actually, the way 'Dithered' is described reveals that this section was written after some arm-waving but before any thinking. More on 'Dithered' later. ==4.4 Advanced== Advanced meets the goal. But only if no ACKs are lost (and they all arrive in order). Its accuracy is sensitive to ACK loss (or reordering), because it's a stateful algo. Also, I show below that, as the level of SCE marking rises, it will quickly become completely broken in the presence of simple ACK thinning. Let's try an example, with d% = 25%, by typing out a finger-random sequence of SCE marks. 01001011000000101000010111001001.... For this sequence, p% = 100*12/32 = 37.5% > d%. The resulting ESCE ACK stream will be: 01-010-1---0-0101---010--1-01-0 .... That's 13 in 32 ~= 41%, compared to 25% for just DelACKs. This is the same inflation of the ACK rate as (AccECN+TCP Option) or (DCTCP-style feedback). The tcpsce draft rather plays down the impact though: The ACK volume is then inflated only slightly compared to an unmarked connection Unlike the SCE schemes, both AccECN and DCTCP can subsequently be thinned or coalesced without the error becoming biased. Here's how 'Advanced' becomes biased. Let's run the above sequence through a simple algo that thins every other ACK: Before: 01-010-1---0-0101---010--1-01-0 .... Forwarded: 0--0-0-----0--1-1----1---1--1-- .... Thinned out: 1--1--1-----0-0----0-0----0--0 .... Oh dear. You can see what's happening. It thins out all the 1s for a while, then switches to thinning out all the zeros. Not only does this completely destroy the feedback signal, it does so with a toggling bias that will push the congestion control into large swings. This would happen for any algo that thins or coalesces an even proportion of ACKs. So, the original input: 01001011000000101000010111001001.... decodes as: 00000000000011111111111111111 .... I'd say that's pretty broken. The problem here is that, for higher SCE marking levels, the Advanced scheme turns any random sequence into a regular sequence. And some thinning/coalescing algos remove information in a regular way. 4.5 ACK Thinning Mathematically, the most extreme errors possible in either direction, due to ack thinning, are easily corrected during subsequent RTTs. Starting a sentence with 'mathematically' makes it sound impressive, but it's not if you haven't actually done the maths. Even if the error in a control signal is unbiased (on average too high as much as too low), subsequent RTTs only correct it in cases when the traffic load is in steady state. That's hardly ever the case on the Internet, so error will not self-correct. If you are trying to control the /dynamics/ of traffic flows with a signal that might sometimes aim too low and sometimes too high, the traffic will swing around more wildly, causing more queuing delay and less utilization. Worse, error introduced by thinning any of the SCE-TCP schemes not only introduces error, it introduces /biased/ error. As explained in each case above. In the case of 'Advanced' the bias toggles at random points. Then if it's already aiming towards too much delay (because the error is under-reporting), half the time it might swing towards even more delay and half the time towards less. In summary, why go to all the bother of introducing a high-fidelity SCE signal, then feeding it back inaccurately, and lop-sidedly? ==Dithered: How to Implement?== If some of the packets to be ACKed have SCE state set then some proportional number of ACK packets SHOULD be sent with the ESCE code point set. Let's say d=50%. After 2 data packets, let's say one is SCE-marked, so we have to send half the next ACK packets with ESCE. But we only have one ACK for these two packets... OK, we send half the next 2 ACK packets with ESCE. But then we've accumulated another two 2 data packets, with 0 1 or 2 SCE marks needing feed back. We need 2 more ACKs. But then we'll have between 0 and 4 more SCE marks to feed back. This will always snowball. Hmmm. From its description, it looks like 'Dithered' wasn't thought through 'much'. One could set each ESCE probabilistically, based on the fraction of marks since the last ACK. That would be noisy feedback, not accurate feedback though. Instead, one could multiply using a repetitive add-subtract algo, to multiply the number of packets between SCE marks by d%, in order to count ACKs until the next ESCE feedback. Even though that removes the noise of randomness, it's still a very inaccurate way to be accurate. For instance, for d%=25% and p%=1%, every 100 packets would be SCE marked, so every 25 ACKs would be ESCE marked. The problem is it's only communicating 1 bit per ACK. If instead it exploited n bits per ACK it would be able to maintain precision while sending ACKs 2^n times less frequently. ===Bigger Picture=== A new feedback scheme requires a change to the TCP wire protocol, which is one of the core Internet protocols. This requires change at both ends before it will work. Therefore, ideally, a new feedback protocol needs to be more generic than any one feedback scheme that might use aspects of the protocol. That is why a requirements exercise [RFC7560] preceded AccECN, which took years in itself. The requirements admit that compromises will be needed. So they are more a checklist of things to consider. And you'll need a good reason, if you don't think you need to meet them. There's also probably requirements missing from RFC7560, 'cos it's hard to think of requirements in the abstract, particularly subtle ones. AccECN is close to becoming more frozen, but it's still open to change. You might want to think of other ways of repurposing it, while the opportunity to make tweaks is still open. Bob On 04/11/2019 22:04, Rodney W. Grimes wrote: > All, Cross posting to tsvwg and tcpm due to overlap in areas covered, > > I have just posted an updated version of TCPSCE which hopefully addresses > most of the feedback provided since the original publication. > > The document structure is unchanged, and the history tool works well > to show what has been changed since the last revision. > > Thanks to those who have provided feedback, and assistance in updating > this draft. > > Regards, > Rod > >> A new version of I-D, draft-grimes-tcpm-tcpsce-01.txt >> has been successfully submitted by Rodney W. Grimes and posted to the >> IETF repository. >> >> Name: draft-grimes-tcpm-tcpsce >> Revision: 01 >> Title: Some Congestion Experienced in TCP >> Document date: 2019-11-04 >> Group: Individual Submission >> Pages: 6 >> URL: https://www.ietf.org/internet-drafts/draft-grimes-tcpm-tcpsce-01.txt >> Status: https://datatracker.ietf.org/doc/draft-grimes-tcpm-tcpsce/ >> Htmlized: https://tools.ietf.org/html/draft-grimes-tcpm-tcpsce-01 >> Htmlized: https://datatracker.ietf.org/doc/html/draft-grimes-tcpm-tcpsce >> Diff: https://www.ietf.org/rfcdiff?url2=draft-grimes-tcpm-tcpsce-01 >> >> Abstract: >> This memo classifies a TCP code point ESCE ("Echo Some Congestion >> Experienced") for use in feedback of IP code point SCE ("Some >> Congestion Experienced"). >> >> >> >> -- ________________________________________________________________ Bob Briscoe http://bobbriscoe.net/
- Re: [tcpm] New Version Notification for draft-gri… Rodney W. Grimes
- [tcpm] Tech review of draft-grimes-tcpm-tcpsce-01… Bob Briscoe
- Re: [tcpm] Tech review of draft-grimes-tcpm-tcpsc… Jonathan Morton
- Re: [tcpm] Tech review of draft-grimes-tcpm-tcpsc… Bob Briscoe
- Re: [tcpm] Tech review of draft-grimes-tcpm-tcpsc… Bob Briscoe
- Re: [tcpm] Tech review of draft-grimes-tcpm-tcpsc… Jonathan Morton