Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-reactions-03

Bob Briscoe <bob.briscoe@bt.com> Tue, 23 October 2012 18:49 UTC

Message-ID: <201210231849.q9NIngtU021236@bagheera.jungle.bt.co.uk>
Date: Tue, 23 Oct 2012 19:49:46 +0100
To: Piers O'Hanlon <p.ohanlon@gmail.com>
From: Bob Briscoe <bob.briscoe@bt.com>
In-Reply-To: <89AE31B0-17D3-4526-BDF9-A4CE9578B5F1@gmail.com>
References: <201210191729.q9JHTarm031903@bagheera.jungle.bt.co.uk> <89AE31B0-17D3-4526-BDF9-A4CE9578B5F1@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Cc: Ken Carlberg <ken.carlberg@gmail.com>, tsvwg IETF list <tsvwg@ietf.org>
Subject: Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-reactions-03
Precedence: list

Piers,

At 16:19 23/10/2012, Piers O'Hanlon wrote:
>Hi Bob,
>
>Thanks for the review. Comments inline.
>
>On 19 Oct 2012, at 18:29, Bob Briscoe wrote:
>
> > Ken & Piers,
> >
> > I finally got round to reviewing this draft.


> > S.3.1
> > "  TFRC ... is seeing ... deployment in .... Empathy/Farsight, 
> and GoogleTalk [googl].
> >
> >   However it should be noted that TFRC is only recommended for real-time
> >   media use with ECN response. TFRC is not recommended for non-ECN paths
> >   due to its loss based operation which leads to full queues with
> >   maximised latencies.
> > "
> >
> > Are you saying Empathy/Farsight and GoogleTalk only recommend 
> TFRC if with ECN, or is this document doing the recommending?
> >
>We're just saying that TFRC has seen some [partial] deployment in 
>Empathy/Farsight and GoogleTalk. We suggested in the draft that if 
>you're going use TFRC then one should use TFRC with ECN.

[BB]: My point was merely to suggest you clarify the text - putting 
'recommended' in the passive voice makes it unclear whether you or 
Google are doing the recommending.

> > "It is assumed that ECN markings will usually occur
> >   with lower queue occupancy and thus lower latency. "
> >
> > That can't be right. RFC3168 insists that an AQM signals ECN 
> where it would otherwise have signalled a drop for a non-ECN 
> packet. There is only one queue for them both, so the occupancy has 
> to be the same for both to satisfy RFC3168.
> >
> > The whole point of the RFC3168 requirement is to ensure that 
> non-ECN cannot starve ECN traffic. If an AQM were to mark ECN 
> traffic at a shorter queue length, then non-ECN traffic would drive 
> up the (same) queue length to the larger non-ECN operating point, 
> and the algo would be marking ECN traffic with a lot higher 
> probability than non-ECN. ECN would then starve itself given 
> RFC3168 insists an ECN transport is meant to react to a mark as if 
> it were a drop.
> >
> > This assumption seems to be the justification for TFRC being 
> appropriate for ECN, where it might not be for non-ECN. If the 
> assumption isn't correct, where does that leave the main message of 
> the document?
> >
>Sure - when comparing an ECN enabled RFC3168 compliant queue with 
>and without ECN. However it depends whether the queue complies with 
>RFC3168 - there are a number of alternative approaches using ECN but 
>not compliant to RFC3168 - such as DCTCP and others. Also there is 
>quite a bit of leeway in the definition of the AQM behaviour in RFC3168.

[BB]: I should have been clearer. The requirement of RFC3168 for the 
queue to behave equivalently for drop and ECN packets isn't just a 
quirk of RFC3168, it's very hard (impossible?) to avoid starvation of 
one or the other (ECN or non-ECN) unless the queue uses the same 
algorithm and parameters for both.

>  But if one is comparing an RFC3168 queue using ECN against against 
> a drop-tail queue then the ECN/RFC3168 queue is generally more 
> favourable to real-time flows due to the probabilistic nature of 
> the marking/dropping.

[BB]: Eh? There aren't two queues. There's only one. If there were 
two queues, it would be hard (impossible?) to know which one to serve 
by how much given packets sometimes arriving as  ECN and sometimes 
not - I suspect even an adaptive approach would just end up looking 
like a single FIFO.

I gave a presentation at the last IETF (Transport Area Plenary) about 
a possible way to get DCTCP/ECN to interwork with classic non-ECN 
traffic (well I would have, but I was cut short due to severe overrun 
of the previous speaker). Here's what I would have said:
<http://www.ietf.org/proceedings/84/slides/slides-84-tsvarea-3.pdf>

>But to be fair you have a point in the case of a direct comparison 
>of RFC3168 ECN or drop.  I think that the more important aspect of 
>using ECN is that it provide for back-off before loss occurs - 
>allowing for flows to adapt earlier and minimise loss.

[BB]: There is a more subtle sense in which ECN reduces latency. I 
just wrote it into the opening para of this draft I just posted:
<http://tools.ietf.org/html/draft-briscoe-tsvwg-ecn-encap-guidelines-01>

"  as ECN is
    used more widely by end-systems, it will gradually remove the need to
    configure a degree of delay into buffers before they start to notify
    congestion (the cause of bufferbloat).  The latter delay is because
    drop involves a trade-off between sending a timely signal and trying
    to avoid impairment, whereas ECN is solely a signal so there is no
    harm triggering it earlier.
"
This is an emergent property that would only happen if ECN became 
predominant, so that the optimum point to set AQM parameters was 
determined mostly by ECN users, not legacy (drop) users.

> > S.5.2 Fault Tolerance
> > I don't see why either redundant (duplicate) transmission or FEC 
> are relevant to a document about using ECN, as opposed to loss, for 
> congestion signalling.
> >
>Well one could take a variety of actions on the reception of ECN (or 
>loss) - the 3GPP specs suggest some interesting things... We've 
>mentioned them as they're one of many possibilities.

[BB]: I suggest there's no need to repeat things other people say 
unless they make sense. I still don't understand why these things make sense.


> > S.5.3
> > I wholly agree that flows that consider themselves important 
> could use a substantially less aggressive back-off. But I complete 
> exemption is of great concern to me. Therein lies a congestion 
> collapse. There is no need for complete exemption, if a flow can 
> respond very little to congestion. At least then if congestion gets 
> really bad, it will still respond a lot. But if congestion is not 
> so bad, it will seem pretty much unresponsive.
> >
>True complete exemption probably should not be generally used.

[BB]: I agree, if you delete 'generally' :)


> > I know you report experiments where ECN exemption has limited 
> effect on normal flows, but the experiments assume low numbers of 
> exempt flows relative to non-exempt. Each individual flow cannot 
> know whether that assumption holds, so making this assumption is a 
> recipe for boiling a network during an emergency - just when we need it most.
> >
>The simulations are of flows that have finite rate limits - so the 
>exempt can take all the bandwidth they need, leaving the rest to be 
>shared amongst the non-exempt flows.

[BB]: As I said, this depends on an assumption about proportions.

> > S.5.3. penultimate para:
> > I don't see how the normal flows can have a different latency 
> from the exempt flows. If they all share the same queue, they all 
> see the same latency at any one time. Certainly the average latency 
> may be different for long-running flows that are present during 
> times when shorter flows are there and when they are not. But the 
> _instantaneous_ latency will be the same for any flows sharing a single queue.
> >
>Yes the delay incurred on the shared bottleneck queue is the same 
>for all flows, however relatively 'higher delay' seen by the exempt 
>flows but it is actually mostly in the sender node transmit queue 
>(which is the normal situation in this general scenario). That is 
>where queuing should be occurring most of the time as the access 
>link (1Mb/s) is lower than the backhaul link (100Mb/s) - it is in 
>effect the bottleneck queue. So the situation of higher delays is 
>actually the norm and only when ECN marking kicks in on the backhaul 
>link does it actually lead to reduced delays for those flows 
>responding to the marking - reducing their rate - and thus their 
>delay. But the exempt flows that don't react to the marking continue 
>with their normal higher delay. The queue in the backhaul link does 
>build once the combined rates of the node exceeds the backhaul 
>capacity but as the other nodes are ECN enabled they reduce their 
>rate so that queue doesn't build beyond the [RED] queue marking thresholds.

[BB]: OK, I can see the point. I assume you are interpreting the 
results of a test or simulation here. I guess what you've actually 
got is a bottleneck shuttling between two points, so the explanation 
is a bit more complicated, but I get the idea.

>Thanks,
>
>Piers.

Cheers


Bob



________________________________________________________________
Bob Briscoe,                                BT Innovate & Design

[tsvwg] Review of draft-carlberg-tsvwg-ecn-reacti… Bob Briscoe
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Piers O'Hanlon
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… ken carlberg
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Bob Briscoe
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Bob Briscoe
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… ken carlberg
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Piers O'Hanlon
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Ruediger.Geib
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Bob Briscoe
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… Bob Briscoe
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… ken carlberg
Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-re… ken carlberg