Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16

Michael Welzl <michawe@ifi.uio.no> Mon, 27 June 2016 22:29 UTC

Return-Path: <michawe@ifi.uio.no>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E3CA12DA1D; Mon, 27 Jun 2016 15:29:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.626
X-Spam-Level:
X-Spam-Status: No, score=-5.626 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-1.426] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id srG2c0_PfgM8; Mon, 27 Jun 2016 15:29:32 -0700 (PDT)
Received: from mail-out4.uio.no (mail-out4.uio.no [IPv6:2001:700:100:10::15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8382B12D9FB; Mon, 27 Jun 2016 15:29:32 -0700 (PDT)
Received: from mail-mx1.uio.no ([129.240.10.29]) by mail-out4.uio.no with esmtp (Exim 4.80.1) (envelope-from <michawe@ifi.uio.no>) id 1bHf21-0000OP-6Q; Tue, 28 Jun 2016 00:29:29 +0200
Received: from 3.134.189.109.customer.cdi.no ([109.189.134.3] helo=[192.168.0.107]) by mail-mx1.uio.no with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) user michawe (Exim 4.80) (envelope-from <michawe@ifi.uio.no>) id 1bHf20-0005Rh-3B; Tue, 28 Jun 2016 00:29:29 +0200
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Michael Welzl <michawe@ifi.uio.no>
In-Reply-To: <3FD27BBF-8E2D-4A42-86A0-C4C0692FF8C9@csperkins.org>
Date: Tue, 28 Jun 2016 00:29:26 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <A1874131-D163-4740-98B9-61F055230A04@ifi.uio.no>
References: <ccf9f2d7-2694-4336-0ec9-ccfebfeb0120@ericsson.com> <CE03DB3D7B45C245BCA0D243277949362F585D3E@MX307CL04.corp.emc.com> <d97e30a7-70f5-26d0-c3a4-0497c669f5f6@ericsson.com> <CE03DB3D7B45C245BCA0D243277949362F586054@MX307CL04.corp.emc.com> <D19E595F-7C66-4AE9-92B4-D550A93F634D@csperkins.org> <CE03DB3D7B45C245BCA0D243277949362F589335@MX307CL04.corp.emc.com> <20160616222548.GB77166@verdi> <0643E158-BF26-4692-8167-B7A959CB20CE@csperkins.org> <CE03DB3D7B45C245BCA0D243277949362F596DBC@MX307CL04.corp.emc.com> <E16BEA87-1D0F-48F1-A9AC-2729079D581D@tik.ee.ethz.ch> <8C16F1C6-B4A7-4BB4-B215-D7E7EAF308F8@erg.abdn.ac.uk> <CE03DB3D7B45C245BCA0D243277949362F59C41D@MX307CL04.corp.emc.com> <3E053A65-2698-4749-8E3D-E0451DF84011@ifi.uio.no> <BF6B00CC65FD2D45A326E74492B2C19FB76A6433@FR711WXCHMBA05.zeu.alcatel-lucent.com> <32a23d69d22062669f78df806a4eb6b8.squirrel@erg.abdn.ac.uk> <BF6B00CC65FD2D45A326E74492B2C19FB76A659B@FR711WXCHMBA05.zeu.alcatel-lucent.com> <CE03DB3D7B45C245BCA0D243277949362F5 AEE02@MX307CL04.corp.emc.com> <6E35FB6C-CA98-413C-B7AE-75402A968017@ifi.uio.no> <3FD27BBF-8E2D-4A42-86A0-C4C0692FF8C9@csperkins.org>
To: Colin Perkins <csp@csperkins.org>
X-Mailer: Apple Mail (2.3124)
X-UiO-SPF-Received:
X-UiO-Ratelimit-Test: rcpts/h 9 msgs/h 2 sum rcpts/h 21 sum msgs/h 6 total rcpts 43742 max rcpts/h 54 ratelimit 0
X-UiO-Spam-info: not spam, SpamAssassin (score=-5.0, required=5.0, autolearn=disabled, MIME_QP_LONG_LINE=0.001, TVD_RCVD_IP=0.001, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO)
X-UiO-Scanned: 5426A5020F93DC23462A308CEEEE1C860525310A
X-UiO-SPAM-Test: remote_host: 109.189.134.3 spam_score: -49 maxlevel 80 minaction 2 bait 0 mail/h: 2 total 1471 max/h 15 blacklist 0 greylist 0 ratelimit 0
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/9DhIy5S5OC1m660Fmo8q-O9fW0g>
Cc: "Black, David" <david.black@emc.com>, "De Schepper, Koen \(Nokia - BE\)" <koen.de_schepper@nokia-bell-labs.com>, "rtcweb@ietf.org" <rtcweb@ietf.org>, tsvwg <tsvwg@ietf.org>, IETF AVTCore WG <avt@ietf.org>
Subject: Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jun 2016 22:29:36 -0000

> On 28. jun. 2016, at 00.02, Colin Perkins <csp@csperkins.org>; wrote:
> 
> 
>> On 27 Jun 2016, at 21:52, Michael Welzl <michawe@ifi.uio.no>; wrote:
>> 
>> David,
>> 
>> 
>>> On 27. jun. 2016, at 22.09, Black, David <david.black@emc.com>; wrote:
>>> 
>>>> As long as an AQM is marking at the same rate as dropping
>>> 
>>> That's an interesting assumption - it should be true for AQMs vetted
>>> here in the past, but there are easy ways for it not to hold (e.g., if dropping
>>> or marking is based on queue occupancy, it is possible that dropping
>>> reduces queue occupancy in a fashion that marking does not).
>>> 
>>> For ECN "classic" (i.e., see RFC 3168) where ECN-CE markings are treated
>>> as drop-equivalent, that is for congestion control purposes, which is similar
>>> to, (but not the same as) the throughput estimation usage for the RTP circuit
>>> breaker.    I'll note that ECN "classic" was designed congestion control
>>> algorithms for react to ECN-CE marks once per RTT, independent of how
>>> many ECN-CE marks are observed in an RTT.
>>> 
>>> Gorry wrote:
>>> 
>>>>> in this context we should use ECN to drive a CC algorithm and we should be
>>>>> cautious to avoid requiring its use within a Circuit Breaker - optional
>>>>> use, if you understand how to interpret a reaction to many CE-marks as
>>>>> excessive congestion, are permitted.
>>> 
>>> Something like that may be workable, starting with a clear distinction between
>>> the use of ECN by CC (routine, active at all times) and ECN by a circuit
>>> breaker (monitors for evidence that things have gotten bad, only activated
>>> when things get bad).   This would baseline the RTP circuit breaker on drops
>>> and allow use of ECN as additional evidence of problems, in contrast to
>>> congestion control where ECN-CE is effectively treated as drop-equivalent.
>>> 
>>> I'm not quite sure how to specify "use of ECN as additional evidence" of
>>> "excessive congestion" as drop-equivalence is about the best we have
>>> for current guidance.
>> 
>> I fail to parse that sentence, so maybe I’m getting you wrong, but anyway I wonder: what’s even the point of this?
>> Why even bother considering CE-marks as information for a circuit breaker?
> 
> Because the alternative is that we only break the circuit once the queue has been driven into overflow, and packets have been lost. We want to avoid that, since it causes latency, and too much latency is very bad for the user experience. 

Well - the better way out would be for the application to react. Maybe this is me misunderstanding the circuit breaker, but I did think it’s more like a last resort… you just don’t want to be trigger-happy with such a thing?


>> CE-marks may *not* indicate *excessive* congestion - and since you say “additional evidence”: I don’t think that a combination of loss and CE-marks makes this any better? CE-marks may be produced by a shallow queue, which can be rather “mild” congestion, at least in the light of what a circuit breaker should consider…
> 
> Surely this is just arguing for a different threshold for a circuit breaker triggered by ECN-CE marks (using a modern, small queue, AQM) than for one triggered by loss (or ECN marks considered equivalent to loss)? 

If you have room for yet another code point, for the circuit breaker only?  :)    Or maybe I just misunderstand you here?


> If I understand the L4S proposal correctly, that would be treat ECN-CE marks on ECT(0) marked flows as equivalent to loss, but treat ECN-CE marks on ECT(1) marked flows with a (much) higher threshold. 

L4S would not change anything about how ECT(0) marked flows are treated, and would CE-mark packets carrying ECT(1) with an instantaneous queue - i.e. a much *lower* threshold. But that’s not the issue - I agree there’s no problem with L4S.

The compatibility problem does exist with the ABE proposal, which works off ECT(0).

The ABE proposal exploits a very simple fact: that CE-marks are, by definition, *not* the same as loss (see David Black’s previous email where he says "if dropping or marking is based on queue occupancy, it is possible that dropping reduces queue occupancy in a fashion that marking does not”). Indeed, queue dynamics play out differently when packets are dropped or marked  ( see Section 7 with Figures 13/14 in https://www.duo.uio.no/bitstream/handle/10852/37381/khademi-AQM_Kids_TR434.pdf ) .

Losses may stem from a DropTail (FIFO) queue somewhere along the path - CE-marks are, however, very likely to only be caused by an AQM algorithm. TCP’s built-in reaction to loss yields full link utilization only when there’s at least a BDP worth of queuing. This is a lot of latency - when the queue is full this doubles the RTT. Modern AQM mechanisms strive to maintain a much smaller average queue size, and this is where they mark packets.

So: if we react to CE-marks the same way as to loss, CoDel and PIE let us underutilize the link.

Thus, it makes more sense to interpret the signal for what it is: an indication that there was congestion, but from a queue that might be much smaller than a BDP.


> Assuming, in all cases, that there’s a parallel congestion control algorithm running

If you assume that there’s a parallel congestion control algorithm running, I understand even less why you want to feed ECN CE-marks into the circuit breaker. The congestion control algorithm should already deal with them.
 

> (and RMCAT has figured out the right congestion response for that; the proposals now treat ECN-CE and loss very similarly).

I disagree that this is the “right” congestion response. It’s a workable one, sure. Nothing extremely terrible will happen if congestion controllers treat ECN-CE and loss similarly - it just yields unnecessarily poor utilization with ECN, with modern AQMs  (unless one backs off by less than TCP would in response to loss too, which is good if there’s an AQM in place but may be quite bad otherwise).

Bottom line: it really does mean something different, and it seems wrong to me to act as if that wasn’t the case - just because we’ve always done so.

Cheers,
Michael