Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-reactions-03

Piers O'Hanlon <p.ohanlon@gmail.com> Tue, 23 October 2012 21:08 UTC

Return-Path: <p.ohanlon@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C77C811E80A3 for <tsvwg@ietfa.amsl.com>; Tue, 23 Oct 2012 14:08:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.599
X-Spam-Level:
X-Spam-Status: No, score=-3.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3i2ah7uQtBNJ for <tsvwg@ietfa.amsl.com>; Tue, 23 Oct 2012 14:08:37 -0700 (PDT)
Received: from mail-wg0-f44.google.com (mail-wg0-f44.google.com [74.125.82.44]) by ietfa.amsl.com (Postfix) with ESMTP id 6EB2C1F0424 for <tsvwg@ietf.org>; Tue, 23 Oct 2012 14:08:37 -0700 (PDT)
Received: by mail-wg0-f44.google.com with SMTP id dr13so2396085wgb.13 for <tsvwg@ietf.org>; Tue, 23 Oct 2012 14:08:36 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to:x-mailer; bh=UYkFWVHH9G23wNuUkr22XninbOm1ghmPhuZBUF941xg=; b=kyQGcRT4dW5ClfGQX4vS+kYnZ4oAKpz0Jn14v3gNjy/pMur4rwuPut4iBVFo/Kict7 krY/jLFiijIMQU0bLCTG8CndNWWU9jZbQCBTGexn4IsyoBirEn17gPedc76lhyBzHKb1 sXZeaymsbJ6MP5/NWkKfGQkCb8gla3kjDKmhmWy3sbrRT2yvFObpXfidVAmWv/19Irsv B9xK93kaCeIGZtmdRWLJbhuBe2SV61dHijoZuDxhgLArlR4KKwsOpPcKoUOSW5I/X0He YXFTSoXt2pW7oEgPn80HCDUbyR0+hWB8lqSWgaF+JkYJE4nJ0GQWd4hTfMSLbvt366r7 8r2Q==
Received: by 10.216.135.24 with SMTP id t24mr7831535wei.147.1351026516608; Tue, 23 Oct 2012 14:08:36 -0700 (PDT)
Received: from black.lan ([149.241.132.141]) by mx.google.com with ESMTPS id ei1sm776512wid.7.2012.10.23.14.08.26 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 23 Oct 2012 14:08:36 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="us-ascii"
From: Piers O'Hanlon <p.ohanlon@gmail.com>
In-Reply-To: <201210231849.q9NIngtU021236@bagheera.jungle.bt.co.uk>
Date: Tue, 23 Oct 2012 22:08:24 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <4F541946-BEB3-45C4-8167-349322CEA8F4@gmail.com>
References: <201210191729.q9JHTarm031903@bagheera.jungle.bt.co.uk> <89AE31B0-17D3-4526-BDF9-A4CE9578B5F1@gmail.com> <201210231849.q9NIngtU021236@bagheera.jungle.bt.co.uk>
To: Bob Briscoe <bob.briscoe@bt.com>
X-Mailer: Apple Mail (2.1283)
Cc: Ken Carlberg <ken.carlberg@gmail.com>, tsvwg IETF list <tsvwg@ietf.org>
Subject: Re: [tsvwg] Review of draft-carlberg-tsvwg-ecn-reactions-03
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tsvwg>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 23 Oct 2012 21:08:38 -0000

Bob,

On 23 Oct 2012, at 19:49, Bob Briscoe wrote:

> Piers,
> 
> At 16:19 23/10/2012, Piers O'Hanlon wrote:
>> Hi Bob,
>> 
>> Thanks for the review. Comments inline.
>> 
>> On 19 Oct 2012, at 18:29, Bob Briscoe wrote:
>> 
>> > Ken & Piers,
>> >
>> > I finally got round to reviewing this draft.
> 
> 
>> > S.3.1
>> > "  TFRC ... is seeing ... deployment in .... Empathy/Farsight, and GoogleTalk [googl].
>> >
>> >   However it should be noted that TFRC is only recommended for real-time
>> >   media use with ECN response. TFRC is not recommended for non-ECN paths
>> >   due to its loss based operation which leads to full queues with
>> >   maximised latencies.
>> > "
>> >
>> > Are you saying Empathy/Farsight and GoogleTalk only recommend TFRC if with ECN, or is this document doing the recommending?
>> >
>> We're just saying that TFRC has seen some [partial] deployment in Empathy/Farsight and GoogleTalk. We suggested in the draft that if you're going use TFRC then one should use TFRC with ECN.
> 
> [BB]: My point was merely to suggest you clarify the text - putting 'recommended' in the passive voice makes it unclear whether you or Google are doing the recommending.
> 
Sure.

>> > "It is assumed that ECN markings will usually occur
>> >   with lower queue occupancy and thus lower latency. "
>> >
>> > That can't be right. RFC3168 insists that an AQM signals ECN where it would otherwise have signalled a drop for a non-ECN packet. There is only one queue for them both, so the occupancy has to be the same for both to satisfy RFC3168.
>> >
>> > The whole point of the RFC3168 requirement is to ensure that non-ECN cannot starve ECN traffic. If an AQM were to mark ECN traffic at a shorter queue length, then non-ECN traffic would drive up the (same) queue length to the larger non-ECN operating point, and the algo would be marking ECN traffic with a lot higher probability than non-ECN. ECN would then starve itself given RFC3168 insists an ECN transport is meant to react to a mark as if it were a drop.
>> >
>> > This assumption seems to be the justification for TFRC being appropriate for ECN, where it might not be for non-ECN. If the assumption isn't correct, where does that leave the main message of the document?
>> >
>> Sure - when comparing an ECN enabled RFC3168 compliant queue with and without ECN. However it depends whether the queue complies with RFC3168 - there are a number of alternative approaches using ECN but not compliant to RFC3168 - such as DCTCP and others. Also there is quite a bit of leeway in the definition of the AQM behaviour in RFC3168.
> 
> [BB]: I should have been clearer. The requirement of RFC3168 for the queue to behave equivalently for drop and ECN packets isn't just a quirk of RFC3168, it's very hard (impossible?) to avoid starvation of one or the other (ECN or non-ECN) unless the queue uses the same algorithm and parameters for both.
> 
Agreed.

>> But if one is comparing an RFC3168 queue using ECN against against a drop-tail queue then the ECN/RFC3168 queue is generally more favourable to real-time flows due to the probabilistic nature of the marking/dropping.
> 
> [BB]: Eh? There aren't two queues. There's only one. If there were two queues, it would be hard (impossible?) to know which one to serve by how much given packets sometimes arriving as  ECN and sometimes not - I suspect even an adaptive approach would just end up looking like a single FIFO.
> 
I was talking of a more of general comparison where one might consider running TFRC on a droptail based network vs another one than runs ECN enabled queues.

> I gave a presentation at the last IETF (Transport Area Plenary) about a possible way to get DCTCP/ECN to interwork with classic non-ECN traffic (well I would have, but I was cut short due to severe overrun of the previous speaker). Here's what I would have said:
> <http://www.ietf.org/proceedings/84/slides/slides-84-tsvarea-3.pdf>
> 
>> But to be fair you have a point in the case of a direct comparison of RFC3168 ECN or drop.  I think that the more important aspect of using ECN is that it provide for back-off before loss occurs - allowing for flows to adapt earlier and minimise loss.
> 
> [BB]: There is a more subtle sense in which ECN reduces latency. I just wrote it into the opening para of this draft I just posted:
> <http://tools.ietf.org/html/draft-briscoe-tsvwg-ecn-encap-guidelines-01>
> 
> "  as ECN is
>   used more widely by end-systems, it will gradually remove the need to
>   configure a degree of delay into buffers before they start to notify
>   congestion (the cause of bufferbloat).  The latter delay is because
>   drop involves a trade-off between sending a timely signal and trying
>   to avoid impairment, whereas ECN is solely a signal so there is no
>   harm triggering it earlier.
> "
> This is an emergent property that would only happen if ECN became predominant, so that the optimum point to set AQM parameters was determined mostly by ECN users, not legacy (drop) users.
> 
Ok sounds good - though isn't one still limited by the BDP queue size needed to accommodate TCP flows?

>> > S.5.2 Fault Tolerance
>> > I don't see why either redundant (duplicate) transmission or FEC are relevant to a document about using ECN, as opposed to loss, for congestion signalling.
>> >
>> Well one could take a variety of actions on the reception of ECN (or loss) - the 3GPP specs suggest some interesting things... We've mentioned them as they're one of many possibilities.
> 
> [BB]: I suggest there's no need to repeat things other people say unless they make sense. I still don't understand why these things make sense.
> 
You don't see why FEC makes sense as congestion reaction (or you're not keen on FEC in general)?

> 
>> > S.5.3
>> > I wholly agree that flows that consider themselves important could use a substantially less aggressive back-off. But I complete exemption is of great concern to me. Therein lies a congestion collapse. There is no need for complete exemption, if a flow can respond very little to congestion. At least then if congestion gets really bad, it will still respond a lot. But if congestion is not so bad, it will seem pretty much unresponsive.
>> >
>> True complete exemption probably should not be generally used.
> 
> [BB]: I agree, if you delete 'generally' :)
> 
I thought you might say that!

> 
>> > I know you report experiments where ECN exemption has limited effect on normal flows, but the experiments assume low numbers of exempt flows relative to non-exempt. Each individual flow cannot know whether that assumption holds, so making this assumption is a recipe for boiling a network during an emergency - just when we need it most.
>> >
>> The simulations are of flows that have finite rate limits - so the exempt can take all the bandwidth they need, leaving the rest to be shared amongst the non-exempt flows.
> 
> [BB]: As I said, this depends on an assumption about proportions.
> 
Sure.

>> > S.5.3. penultimate para:
>> > I don't see how the normal flows can have a different latency from the exempt flows. If they all share the same queue, they all see the same latency at any one time. Certainly the average latency may be different for long-running flows that are present during times when shorter flows are there and when they are not. But the _instantaneous_ latency will be the same for any flows sharing a single queue.
>> >
>> Yes the delay incurred on the shared bottleneck queue is the same for all flows, however relatively 'higher delay' seen by the exempt flows but it is actually mostly in the sender node transmit queue (which is the normal situation in this general scenario). That is where queuing should be occurring most of the time as the access link (1Mb/s) is lower than the backhaul link (100Mb/s) - it is in effect the bottleneck queue. So the situation of higher delays is actually the norm and only when ECN marking kicks in on the backhaul link does it actually lead to reduced delays for those flows responding to the marking - reducing their rate - and thus their delay. But the exempt flows that don't react to the marking continue with their normal higher delay. The queue in the backhaul link does build once the combined rates of the node exceeds the backhaul capacity but as the other nodes are ECN enabled they reduce their rate so that queue doesn't build beyond the [RED] queue marking thresholds.
> 
> [BB]: OK, I can see the point. I assume you are interpreting the results of a test or simulation here. I guess what you've actually got is a bottleneck shuttling between two points, so the explanation is a bit more complicated, but I get the idea.
> 
Right.

Cheers,

Piers

>> Thanks,
>> 
>> Piers.
> 
> Cheers
> 
> 
> Bob
> 
> 
> 
> ________________________________________________________________
> Bob Briscoe,                                BT Innovate & Design