Re: [tsvwg] Follow-up to your DSCP and ECN codepoint comments at tsvwg interim

Bob Briscoe <> Sun, 08 March 2020 14:18 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 909C93A0A44 for <>; Sun, 8 Mar 2020 07:18:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Cr_wp_fzaaeK for <>; Sun, 8 Mar 2020 07:18:23 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2E43C3A0A45 for <>; Sun, 8 Mar 2020 07:18:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:Cc:References:To:Subject:From:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=LZWWGKyQ+ufS0eZqcMGJ+2bFezT7cmuqYWDtplPBddQ=; b=CqIc9VC+PsA9UoHK9LqGGkffu B02olArmmBhNlTMJrDN63Zd2JO+YHT433BQWX7yh1L7BTDSW6XP385ckdzqrrD/lpOx8B1Ik/nqTw LlJnigbagHs66ktfeSGVrL/sexFeetjq+sRtNlMapjsPb2K7pjnk3m35gGHdq7f7G6xC0WBQMePYb QpCb5+AcnjmemaA091r9ERtFHgCPwq0rMhBdt1z4YSlGiMUr6yjcy81TvhJoaGbolvDxEJQ1SkwLp E5wzkrwtIqtoCO39Yx42yzTL7eKnj3L5/qg3pYpDaLySrzc4yybVMjJyxlZ2s5oA8QEmiIeeJi+Du c6ortXHUQ==;
Received: from [] (port=52628 helo=[]) by with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <>) id 1jAwl0-00062p-3d; Sun, 08 Mar 2020 14:18:18 +0000
From: Bob Briscoe <>
To: Steven Blake <>
References: <> <>
Cc: tsvwg IETF list <>
Message-ID: <>
Date: Sun, 08 Mar 2020 14:18:17 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------5FA315682C40FA9154A6AD26"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
Archived-At: <>
Subject: Re: [tsvwg] Follow-up to your DSCP and ECN codepoint comments at tsvwg interim
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 08 Mar 2020 14:18:27 -0000


Exec summary:
*If you hobble an experiment too much unnecessarily, you make it fail.*

See inline for the details. And the details *matter*...

On 07/03/2020 03:49, Steven Blake wrote:
> Hi Bob. Long time, no see ...
> On Sat, 2020-03-07 at 00:34 +0000, Bob Briscoe wrote:
>> Steve,
>>> - Steve Blake mentioned that L4S could be made compatible with
>>> each.
>> Can you say more about what's behind this idea? or point me to where
>> you've already said it?
> L4S could use:
> - EXP DSCP to select low-latency queue
> - Optional: EXP DSCP to select the coupled queue (for specific non-TCP
>    Prague traffic being used as part of the experiment)
> - SCE-style ECN signaling (ECT(1) and/or CE)

That seems possible,... until you realize the details of codepoint 
traversal problems round the feedback loop (below). Then the flakiness 
of codepoint traversal disappoints so many potential users that the 
whole experiment gets a rubbish reputation, even if the performance when 
it does work is cool.

Even if we'd taken that approach 5 years ago, it still wouldn't be 
working today, because of the problem below.
The approach we chose worked immediately 5 years ago, and it still works 
today. We knew that required strong IETF blessing. We got that at the 
time. OK, now the WG has decided to check that decision. But, please, 
after that, can everyone get behind whatever decision is made.

  * SCE-style signalling doesn't propagate out of (hardly) any layerings
    on the current Internet (tunnels, lower layers).
      o The original ECN tunnelling specs didn't consider ECT1 as a
        lower strength CE. So if outer/inner is ECT1/ECT0, then RFC3168
        and IPSec decapsulators will just strip the outer ECT1.
      o The drafts that are extending RFC6040 decapsulation behaviour to
        all the other layering protocols aren't even through WGLC yet
        (ecn-encap-guidelines and rfc6040update-shim). And even for
        IP-in-IP tunnels, a large proportion don't comply with rfc6040 yet.
      o Tunnels / pseudowires / layerings / VPNs / etc are often 'over
        the top' of the network operator wanting to run the experiment.
  * SCE-style signalling would still need a transport layer feedback
    protocol that gave hi-fidelity ECT1 feedback.
      o But the approach proposed for SCE TCP feedback (individual draft
        in tcpm) would be randomly decimated by all the ACK thinning in
        the real Internet (WiFi, DOCSIS, LTE, etc).
      o Using QUIC is an alternative, but we have to support TCP as
        well. All the existing apps we have demonstrated "just worked"
        when we updated the TCP stack under them. To use QUIC, all these
        existing TCP apps would have to be updated to use QUIC. That
        would be a huge extra barrier.
      o TCPM has allocated 3 bits in the main TCP header for hi-fidelity
        CE feedback 'cos those bits traverse networks fairly reliably.
        The chances of TCPM allocating these flags for ECT1 instead of
        CE, or allocating more reserved flags for an ECT1 feedback
        experiment are nil, I'd have thought
      o So, realistically, to get hi-fi ECT1 feedback, you'd need a TCP
        Option. But all the big end-system developers are not planning
        to implement the TCP Option part of AccECN, to avoid middlebox-hell.

And why would Google/Apple/Microsoft etc want to go though 
middlebox-hell for an SCE-ECT1 experiment that isn't going to traverse 
most networks anyway?
And why would tunnel developers ensure SCE-ECT1 traverses their tunnels 
for an experiment that none of the big end-system companies have bought 

So SCE-style signalling would work for some people sometimes, and 
randomly fail for most people.

Steve, please, I'm not stupid. We've thought of all these things. We 
thought of them years ago.
This is why I referred you to the pros and cons in the draft.

Then, *even if* your proposed way of deploying the experiment succeeded, 
how would it migrate to full Internet deployment? If it continued to use 
SCE-ECT1, it would not be classifiable into separate queues without 
DSCPs as well.

Otherwise it would only work for FQ, precluding it from high speed kit 
(and from lower layer AQMs from where port numbers are inaccessible).

If you hobble an experiment too much unnecessarily, you make it fail.

>>> - Steve Blake noted anyone running these experiments will tinker
>>> with
>>> boxes, so may handle the DSCP bleaching concern.
>> Unfortunately, that's only partially true. It's fairly true for
>> CDN->user applications. However, for (say) conversational video or
>> online gaming, AR, VR, etc, which are particularly important targets
>> apps, the path is often either accessISP-accessISP or
>> accessISP-cloud-accessISP. Altho most access ISPs nowadays are
>> pretty
>> tightly meshed with other access ISPs (with less intermediaries than
>> there used to be), there are still significant numbers of cases
>> where
>> two access ISPs are not directly connected.
>> Anyway, if user B's downstream is the bottleneck for user A, even if
>> ISP
>> B doesn't bleach Diffserv, the DSCP is still no use if ISP A
>> bleaches,
>> or if an intermediary does.
>> See
>> for write-up of the pros and cons.
>> Network traversal is already a difficult problem, even with just a
>> very
>> few ECN problems. We really don't want to make this impossible. With
>> a
>> 3-part (sender-bottleneck-receiver) L4S deployment, there is already
>> enough deployment barrier without adding Diffserv traversal to make 4
>> or
>> more parts (sender-border-bottleneck-receiver). For SCE, there's
>> already
>> 4-parts to deploy (tunnels added), so let's not countenance making
>> that
>> 5 or more.
> You are proposing to do an experiment. The experiment requires
> cooperation between one or more operators that have to deploy and
> configure equipment that frankly doesn't exist yet (at least not for
> real core link speeds).
Later you complain that equipment vendors have pre-empted the IETF and 
already built this into silicon, presumably based on the L4S status 
report at the last IETF:
You cannot also complain that implementations don't exist yet.

Most of the interest is for access networks, where most bottlenecks are. 
The DualQ is designed to scale to core speeds. But that would usually be 
pretty pointless, given bottlenecks only move to the core very 
occasionally, by good network design.

Way more operators have plans for L4S DualQ deployment (even before the 
RFCs are baked) than have ever enabled Classic ECN in the last 2 
decades. Surely you can work out that the implementations are happening 
because operators are showing commercial interest to those implementers.

(See later about whether to view this as pre-empting the IETF, or taking 
a risk while the IETF deliberates.)

> Modifying current operational practice
> regarding configuration/use of DSCPs for traffic intended to
> participate in the experiment is a minor inconvenience compared to the
> other operational changes that are required to perform the experiment.
Yes. It's the SCE part of your proposal that's much more problematic 
than the DSCP part. But the DSCP part is also just an unnecessary 

What's the problem with the proposed L4S approach? In a nut-shell:

  * ECT1 appears not to be currently used on the Internet.
  * The experimental traffic has to be ECN-capable.
  * So using ECT1 as the identifier of the experimental traffic is
  * It works. And it is identifiable - it can be ignored/blocked/stopped
    if necessary. But there's no evidence that will be necessary.

Why cause a load of extra deployment grief by requiring another 
identifier (DSCP) as well.
And why switch to using SCE-style, which makes it not work over real 
networks, with real TCP?

> I am not even slightly convinced that we understand the best approach
> to use for AQM on >= 100 GE links with tens of thousands of active
> flows (I'm a signal processing guy; it will probably require FFTs.) :)
> In fact, and you should appreciate this, if the goal is to minimize
> queueing delay and delay variation, IMHO we should revisit some of the
> PCN ideas for detecting pre-congestion on a node, even for slower links
> (think AVQ-style AQM).
Surely you don't think I hadn't thought of this? :)
That is the longer-term plan.

Virtual queue is already possible in the DOCSIS implementations - by 
operator config - future proofed. The operator just has to configure the 
max sustained rate of the user's aggregate service flow (ie. the rate 
for both L&C queues) a little lower than the bandwidth of the channel.

> Having the ability to signal multiple levels of
> "please  decelerate" is very intriguing. I think you believe that the
> transit operators are so rarely congested that you don't have to care
> (and you don't want to have to coordinate with them), but I believe
> that this should be demonstrated, not asserted. 100 GE is already being
> used for access uplink today.
I'm not actively "not caring" about transit operators, I'm just focusing 
my energy where I think it's most needed first.
Anyway, what I choose to do doesn't limit what everyone else does. 
Nonetheless, I have been helping core folks put in place the pieces 
needed early, so these next steps can be made.

> If you guys had agreed to isolate your L4S experiment's traffic with
> experimental DSCPs, people would have shrugged and said "go ahead" and
> you could have started your experiment two years ago with IETF's
> blessing and gotten real operational data by now. Of course, people are
> free to do whatever they want with their gear and their networks, IETF
> blessing or not, so if you can get operators to agree to play with your
> proposed ECN signaling, carry on. If vendors (e.g., Nokia) had not
> already implemented the proposed L4S ECN signaling in hardware and then
> tried to use that as an argument in support of ratifying such signaling
> in the IETF (and carrying it into 3GPP, FFS),
I'm not aware that anyone has ever used that as an argument. Companies 
are taking a risk to implement before the RFC is baked. That might look 
like applying pressure on others to go the same way. But I think that's 
in the eye of the beholder. They are the ones having to take the risk.

> then I wouldn't care what
> signaling is used *for the experiment*. As it is, I oppose changing the
> definition of ECN bits
Er, but your proposal changes the definition of ECN bits to SCE. Yes, 
it's theoretically conditional on an accompanying DSCP. But, to make it 
work, silicon in TCP offload hardware has to be updated. Not to mention 
silicon in tunnel decapsulators, silicon that does the ECN marking, the 
checksum recalculations, etc. Not a reversible experiment, either way.

I understand your concern. And you're entitled to take a stand. But it's 
not as black and white as you make out. It's a balance of a number of 
factors across the layers, including whether existing *applications* 
have to be changed.

That's why we asked the IETF from the start (well, after 2 years of our 
own experiments). And I have to say that none of the factors that went 
into the WG's decision to go with ECT1 as an identifier have changed.

> until L4S (or any other experiment) successfully
> concludes, especially not when a (IMHO) reasonable alternative exists.
As above, the "reasonable alternative" is flakey at best.

*If you hobble an experiment too much unnecessarily, you make it fail.*


> Argue about the optimal ECN semantics for Internet-wide deployment
> after you've gotten some operational experience.
> Regards,
> // Steve

Bob Briscoe