[re-ECN] re-echo of drop (was: Re: TCP's "Dynamic Range")

Bob Briscoe <rbriscoe@jungle.bt.co.uk> Thu, 29 October 2009 15:08 UTC

Return-Path: <rbriscoe@jungle.bt.co.uk>
X-Original-To: re-ecn@core3.amsl.com
Delivered-To: re-ecn@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 00FA63A67E6 for <re-ecn@core3.amsl.com>; Thu, 29 Oct 2009 08:08:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.864
X-Spam-Level:
X-Spam-Status: No, score=-1.864 tagged_above=-999 required=5 tests=[AWL=0.253, BAYES_00=-2.599, DNS_FROM_RFC_BOGUSMX=1.482, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aaKT74mT740j for <re-ecn@core3.amsl.com>; Thu, 29 Oct 2009 08:08:49 -0700 (PDT)
Received: from smtp2.smtp.bt.com (smtp2.smtp.bt.com [217.32.164.150]) by core3.amsl.com (Postfix) with ESMTP id 94D703A6782 for <re-ecn@ietf.org>; Thu, 29 Oct 2009 08:08:48 -0700 (PDT)
Received: from i2kc08-ukbr.domain1.systemhost.net ([193.113.197.71]) by smtp2.smtp.bt.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 29 Oct 2009 15:09:03 +0000
Received: from cbibipnt08.iuser.iroot.adidom.com ([147.149.100.81]) by i2kc08-ukbr.domain1.systemhost.net with Microsoft SMTPSVC(6.0.3790.3959); Thu, 29 Oct 2009 15:09:03 +0000
Received: From bagheera.jungle.bt.co.uk ([132.146.168.158]) by cbibipnt08.iuser.iroot.adidom.com (WebShield SMTP v4.5 MR1a P0803.399); id 1256828942675; Thu, 29 Oct 2009 15:09:02 +0000
Received: from MUT.jungle.bt.co.uk ([10.73.63.36]) by bagheera.jungle.bt.co.uk (8.13.5/8.12.8) with ESMTP id n9TF8ua3018965; Thu, 29 Oct 2009 15:08:56 GMT
Message-Id: <200910291508.n9TF8ua3018965@bagheera.jungle.bt.co.uk>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Thu, 29 Oct 2009 15:08:54 +0000
To: Matt Mathis <matt.mathis@gmail.com>
From: Bob Briscoe <rbriscoe@jungle.bt.co.uk>
In-Reply-To: <4AE82B4C.5000100@thinkingcat.com>
References: <4AE26E9B.8060205@thinkingcat.com> <200910242327.n9ONRbZt023456@bagheera.jungle.bt.co.uk> <4AE4CBDB.4030806@thinkingcat.com> <200910261228.n9QCSCp0030099@bagheera.jungle.bt.co.uk> <20091026133640.GA62345@verdi> <200910262116.n9QLGTBE010898@bagheera.jungle.bt.co.uk> <4AE6E99B.6050907@thinkingcat.com> <fc0ff13d0910271822n7e0ec0ceq575b9121678539e6@mail.gmail.com> <4AE82B4C.5000100@thinkingcat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed
X-Scanned-By: MIMEDefang 2.56 on 132.146.168.158
X-OriginalArrivalTime: 29 Oct 2009 15:09:03.0448 (UTC) FILETIME=[C06BAD80:01CA58A9]
Cc: re-ecn@ietf.org
Subject: [re-ECN] re-echo of drop (was: Re: TCP's "Dynamic Range")
X-BeenThere: re-ecn@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: re-inserted explicit congestion notification <re-ecn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/re-ecn>
List-Post: <mailto:re-ecn@ietf.org>
List-Help: <mailto:re-ecn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/re-ecn>, <mailto:re-ecn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Oct 2009 15:08:57 -0000

Matt,

I'm more sold than ever on the idea of incorporating explicit 
re-insertion of loss into the requirements for a protocol out of ConEx.

It's v important for incremental deployment. Of all the possible 
first steps to deployment, I think end-systems before networks is the 
most likely. OS & app authors have an incentive to declare that they 
are NOT the source of a large amount of congestion in a standard way. 
Then traffic mgmt boxes will be able to check the veracity of these 
claims and reward the traffic accordingly (without inspecting packets 
deeply). Consider Microsoft BITS (a precursor to LEDBAT) that is used 
to download Windows Update and could be used by anti-virus updates 
etc. Or filesharing using scavenger transports like LEDBAT.

It's fairly straightforward to do but I describe some wrinkles below, 
if I can be allowed to talk in terms of our specific re-ECN protocol 
spec (this documents a conversation Matt & I have been having offline):

For those who don't want the technical details, here's a summary
1/ Since draft-00 (Oct 2005), the re-ECN spec has always said re-echo 
whether congestion is drop or ECN-marking.
2/ Re-ECN currently doesn't re-echo on drop unless the receiver is at 
least ECN-capable. But there's an easy way to make that happen, but 
with possible devil in details.
3/ Yes, black hole detection is critical (to re-ECN on drop as much as to ECN).

==========================================================================
There are two senarios with very different constraints. In both cases 
you have to have a re-ECN-capable sender (or proxy) to do the 
re-echoing. The distinction is whether the recevier is 
non-ECN-capable or at least ECN-capable (which includes re-ECN capable).

== 1/ At least ECN-capable receiver (ie ECN or re-ECN capable recevier) ==

Since draft-00, the re-ECN spec <draft-briscoe-tsvwg-re-ecn-tcp> 
S.6.1.1 has always said:
"                                               Whenever the ECI field
       increments by D (and/or d drops are detected), the sender MUST
       clear the RE flag to "0" in the IP header of the next D' data
       packets it sends
"
Indeed, explicitly reinserting a notification following a drop could 
be called reinserted explicit congestion notification, strangely the 
acronym for thatwould be re-ECN :)

We have recently realised there is an issue with re-echo on drop if 
the drops are from a policer (re-echoing unnecessarily gets the 
sender into deeper and deeper trouble). But I have a flakey solution 
to that, which we can work on writing in I-D language.

== 2/ Receiver that doesn't understand ECN or re-ECN. ==

The sender shouldn't use an ECN-capable codepoint for re-echoing if 
the receiver doesn't understand ECN. Otherwise, if these packets go 
through a congested ECN-capable router, they will get marked, but the 
receiver won't understand ECN marks and just ignore them.

For low levels of congestion, this seems like a corner case; it is 
only a problem for proportion p^2 of packets, where p is the loss 
fraction. So if p=1% the receiver misses a congestion notification on 
0.01% of packets. But if 20% of packets are being dropped in a 
pathological case, 20% of the sent packets will be made ECN-capable, 
so theoretically 4% of throughput could be ECN-marked and ignored by 
the receiver.

Actually, at 20% loss levels, RFC3168 says a queue shouldn't still be 
ECN-marking, but the complexity around this debate is beyond the 
scope of this email...

One solution would be to define a non-ECN-capable codepoint for 
re-echoing drop. As there's only one non-ECN-capable codepoint of the 
ECN, field (00), with only one bit available in IPv4 (the reserved 
flag or RE), there seems to be only one choice:
         ECN = 00, RE = 1

This combination is already defined in re-ECN for meaning "Cautious 
expectation of downstream congestion" or "I am declaring my 
expectation of congestion conservatively, because I've not seen 
enough recent feedback to know the path". This is intended for use at 
the start of a flow, or after an idle period.

We would have to fully work out what it would break to overload this 
meaning with "I've seen actual congestion but it was drop and the 
receiver doesn't understand ECN."

There's also a wrinkle in that this combination can be considered 
ECN-capable by routers under strong conditions given in the draft. I 
think these conditions make it safe for re-echoing drop as well.

So this offers a significant ray of hope that this will be 
straightforward to specify and implement. But the devil may be in the details.

3/ End-point OSs & apps will only deploy re-ECN opportunistically 
(without any certain benefit from the network) if there is absolutely 
no downside. Therefore, we will need an aggressive programme to 
remove black holes for the use of the reserved flag (if that's what 
we go for), and code to do fast blac-hole detection learning from the 
experience with ECN, as Matt says.

I imagine a stds track RFC saying the reserved flag has been 
reassigned from reserved to experimental use (without saying what 
use). That can be used to beat equipment vendors over the head with, 
if they drop or clear that flag.

Experience with ECN showed that happened over 2-3yrs, but the 
residual problems of home gateways weren't ever attacked sufficiently 
hard. The deployment incentive for ECN was a small performance gain. 
The incentives for re-ECN could be a lot stronger.



Cheers


Bob

At 11:30 28/10/2009, Leslie Daigle wrote:
>[Vaguely suspecting you sent your message from a non-subscribed 
>e-mail address, as this doesn't appear to have come through on the 
>main mailing list.]
>
>I don't see any issue with talking about re-ECN and re-Feedback, and 
>being clear about the distinction, in the context of the BoF.  I'm 
>not sure how much airtime we need/want to spend on comparing and 
>contrasting them there, beyond ensuring that it is clear there is at 
>least one reasonable to pursue in a WG.
>
>In terms of supporting ECN deployment -- I'm not sure how much 
>airtime that requires at the BoF, beyond possible discussion of 
>whether it is a gating factor for viable progress, and whether there 
>is a work item on an eventual charter.  Is that 
>reasoanble?  However, I think it is important to make sure it 
>remains a collateral point.
>
>Leslie.
>
>Matt Mathis wrote:
>>So I have two issues that I would like to see addressed, I think as
>>part of existing agenda items, but I want to be sure that they are not
>>ruled out of order by the anti-rat hole police.
>>First, I really think we need to broaden Re-ECN to RE-Feedback, even
>>under the still broader umbrella congestion exposure.  RE-ECN
>>presupposes the deployment of ECN, which has not deployed in 10 years.
>>  I believe that loss based RE-Feedback can provide much of the
>>benefits of RE-ECN, but with a far easier deployment strategy.
>>Furthermore unifying the semantics of the two approaches will make
>>both stronger.
>>My point is really that we need to view loss based and ECN based RE feedback
>>as equals, both likely to be present in steady state, and not one
>>merely as a deployment strategy for the other.  Our terminology should
>>reflect this.
>>Second, we need a task to actively promote ECN deployment.  Many
>>people have have worked on this, and it has been all but vetoed by a
>>couple of persistent problems.   Although shepherding deployment is
>>not a traditional IETF activity, it really think it is the right venue
>>to bring the right people to bear on the problem.   We need to divide
>>the problem, such that improved technology in a couple of specific
>>areas will help the broader community to help us to eliminate the tiny
>>persistent problems that have thwarted ECN deployment.   I would
>>really like to see the technological part done under ConEX.
>>(The required technology is to make OS releases that are targeted for
>>technology users and beta testes run ECN black hole detection with
>>automatic diagnosis and reporting.  As we get more experience with
>>these algorithms and they get enabled for progressively wider pools of
>>users we will gradually eliminate the bugs in the infrastructure).
>>Thanks,
>>--MM--
>>-------------------------------------------
>>Matt Mathis      http://staff.psc.edu/mathis
>>Work:412.268.3319   Home/Cell:412.654.7529
>>-------------------------------------------
>>Evil is defined by mortals who think they know
>>"The Truth" and use force to apply it to others.
>>
>>On Tue, Oct 27, 2009 at 8:37 AM, Leslie Daigle 
>><leslie@thinkingcat.com> wrote:
>>>I like the principles a lot, too.
>>>
>>>I'm thinking that could be an excellent jumping off point for "constraints",
>>>in the agenda.   Phil's currently slated to do that section.  Are you (John)
>>>going to be in Hiroshima?  And/or, can you work with Phil to integrate these
>>>points into the discussion-leading material?
>>>
>>>Leslie.
>>>
>>>Bob Briscoe wrote:
>>>>John,
>>>>
>>>>I really like your list of principles. Where in the BoF do you suggest we
>>>>put it?
>>>>
>>>>And I agree with everything you've said in this email, with just a couple
>>>>of inline comments...
>>>>
>>>>At 13:36 26/10/2009, John Leslie wrote:
>>>>>Bob Briscoe <rbriscoe@jungle.bt.co.uk> wrote:
>>>>>>At 22:06 25/10/2009, Leslie Daigle wrote:
>>>>>>>Bob Briscoe wrote:
>>>>>>>>I've been thinking... We should add an item to the many purposes
>>>>>>>>list:
>>>>>>>>
>>>>>>>>- evolution path beyond TCP (running out of dynamic range)
>>>>>>>Which is pretty cool, but on the bof agenda might lead to some
>>>>>>>ratholing on whether we're just bashing TCP, no?
>>>>>>[BB] Not at all. Everyone in the transport area knows TCP has run out
>>>>>>of dynamic range. It's a well-known problem.
>>>>>   I've been thinking somewhat along this line, though I wasn't using
>>>>>the term "dynamic range", but rather considering signal-to-noise in
>>>>>our feedback function.
>>>>>
>>>>>   Beyond any question, TCP has been successful at curing "congestion
>>>>>collapse" -- which is, after all, what it aimed to do.
>>>>>
>>>>>   But packet loss, as a feedback signal, is frankly terribly unsuited
>>>>>to fine-tuning how to share bandwidth at the bottlenecks. It's not,
>>>>>IMHO, "bashing" TCP to point out this difference.
>>>>>
>>>>>   Network operators, at best, can only estimate packet loss, not
>>>>>measure it meaningfully, and even if they could measure it with 100%
>>>>>accuracy, they'd have no hope of relating packet loss to the backoff
>>>>>it signals.
>>>>>
>>>>>   That's because the multiplicative decrease it signals is based
>>>>>on end-to-end flows at a higher layer. We're looking at five or more
>>>>>orders of magnitude difference in the amount of backoff signaled by
>>>>>a packet lost. For a network operator, the signal is quite lost in
>>>>>the noise!
>>>>>
>>>>>   As we review the history of TCP, we notice that attempts to move
>>>>>away from packet loss have failed to dependably avoid congestion
>>>>>collapse: thus implementations tend to depend entirely on packet
>>>>>loss to signal backoff; and other congestion-notification tends to
>>>>>be ignored or even suppressed.
>>>>>
>>>>>   (BTW, that's an issue we need to be prepared to discuss: how can
>>>>>re-ecn operate when ECN marks are suppressed? Even though most of
>>>>>the suppression history concerns ICMP, there will be folks who think
>>>>>ECN will suffer similar suppression.)
>>>>As this sentence is in the passive, I assume you mean suppression by the
>>>>transport or some other link than the congested one (not suppression by the
>>>>congested link itself).
>>>>
>>>>That's why we brought re-ECN to the IETF - because we had solved that
>>>>problem. The draft-briscoe-tsvwg-re-ecn-tcp-motivation-01.txt explains the
>>>>mechanisms that can be built over re-ECN to detect & prevent suppression.
>>>>
>>>>I've actually got a PhD thesis of proofs on this now, with pseudocode of
>>>>the mechanisms, simulations etc etc, but I haven't got round to asking my
>>>>company for permission to publish (mainly because they are more focused on
>>>>sacking people than authorising publications at the mo). I promise it will
>>>>be published soon.
>>>>
>>>>
>>>>>   We should recognize that multiplicative-decrease on packet loss
>>>>>is a proven winner for avoiding congestion collapse, and concentrate
>>>>>on what's a better signal for management of bottleneck bandwidth.
>>>>>I propose a few principles:
>>>>>
>>>>>1) The signal needs to be visible to the network manager managing
>>>>>   the bottleneck;
>>>>>
>>>>>2) The signal should be distinct enough to be the basis for cost
>>>>>   allocation for upgrading the bottleneck;
>>>>>
>>>>>3) The signal should be visible to end-systems, giving a decent
>>>>>   measure of how much to backoff their sending rate;
>>>>>
>>>>>4) The signal should enable "backpressure" to allow network managers
>>>>>   to avoid forwarding too many packets towards the bottleneck;
>>>>>
>>>>>5) The signal should not involve packet loss.
>>>>>
>>>>>   I believe that a properly-designed signalling system can work
>>>>>for at least eight or nine orders of magnitude of sender bandwidth.
>>>>>To be complete, a proposal should probably get into how many bits
>>>>>per signal, but I'm personally convinced that re-ecn can work
>>>>>beyond five orders of magnitude.
>>>>Have you been following Matt Mathis's work on Relentless TCP? And
>>>>generally on TCP algos with window proportional to 1/p, rather than
>>>>1/sqrt(p) like current TCP. The idea is these maintain the same number of
>>>>loss or ECN signals per window however fast you go.
>>>>
>>>>Is there some reason for choosing 8 or 9 orders of magnitude? I would have
>>>>thought 1/p would scale indefinitely, but you may be thinking of other
>>>>factors I've missed.
>>>>
>>>>Scaling was also one of the main motivations for Kelly's primal algo. And
>>>>it was one of my motivations for introducing re-ECN so we could shift from
>>>>the TCP-friendly (1/sqrt(p)) track painlessly onto a scalable 1/p track
>>>>without worrying about flow fairness.
>>>>
>>>>
>>>>Bob
>>>>
>>>>
>>>>>   So, avoiding the bits-per-signal question, are the five principles
>>>>>above sufficient? Are they necessary? Can we come up with a good
>>>>>presentation of such principles for the BoF?
>>>>>
>>>>>--
>>>>>John Leslie <john@jlc.net>
>>>>________________________________________________________________
>>>>Bob Briscoe,                                BT Innovate & Design
>>>--
>>>
>>>-------------------------------------------------------------------
>>>"Reality:
>>>     Yours to discover."
>>>                                -- ThinkingCat
>>>Leslie Daigle
>>>leslie@thinkingcat.com
>>>-------------------------------------------------------------------
>>>_______________________________________________
>>>re-ECN mailing list
>>>re-ECN@ietf.org
>>>https://www.ietf.org/mailman/listinfo/re-ecn
>
>--
>
>-------------------------------------------------------------------
>"Reality:
>      Yours to discover."
>                                 -- ThinkingCat
>Leslie Daigle
>leslie@thinkingcat.com
>-------------------------------------------------------------------
>_______________________________________________
>re-ECN mailing list
>re-ECN@ietf.org
>https://www.ietf.org/mailman/listinfo/re-ecn

________________________________________________________________
Bob Briscoe,                                BT Innovate & Design