Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16

Michael Welzl <michawe@ifi.uio.no> Wed, 29 June 2016 08:54 UTC

Return-Path: <michawe@ifi.uio.no>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BDDDF12D178; Wed, 29 Jun 2016 01:54:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.626
X-Spam-Level:
X-Spam-Status: No, score=-5.626 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-1.426] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fggf8ucP1sw6; Wed, 29 Jun 2016 01:54:40 -0700 (PDT)
Received: from mail-out5.uio.no (mail-out5.uio.no [IPv6:2001:700:100:10::17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 70A0512DA5B; Wed, 29 Jun 2016 01:54:40 -0700 (PDT)
Received: from mail-mx1.uio.no ([129.240.10.29]) by mail-out5.uio.no with esmtp (Exim 4.80.1) (envelope-from <michawe@ifi.uio.no>) id 1bIBGV-0005qd-UY; Wed, 29 Jun 2016 10:54:35 +0200
Received: from 1x-193-157-240-251.uio.no ([193.157.240.251]) by mail-mx1.uio.no with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) user michawe (Exim 4.80) (envelope-from <michawe@ifi.uio.no>) id 1bIBGV-0007N9-7B; Wed, 29 Jun 2016 10:54:35 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Michael Welzl <michawe@ifi.uio.no>
In-Reply-To: <2E09525C-C1AD-41D1-AE22-865518FA0FBE@csperkins.org>
Date: Wed, 29 Jun 2016 10:54:32 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <DD563445-98AD-43F1-8AB8-3E70FDC8F9F1@ifi.uio.no>
References: <ccf9f2d7-2694-4336-0ec9-ccfebfeb0120@ericsson.com> <CE03DB3D7B45C245BCA0D243277949362F585D3E@MX307CL04.corp.emc.com> <d97e30a7-70f5-26d0-c3a4-0497c669f5f6@ericsson.com> <CE03DB3D7B45C245BCA0D243277949362F586054@MX307CL04.corp.emc.com> <D19E595F-7C66-4AE9-92B4-D550A93F634D@csperkins.org> <CE03DB3D7B45C245BCA0D243277949362F589335@MX307CL04.corp.emc.com> <20160616222548.GB77166@verdi> <0643E158-BF26-4692-8167-B7A959CB20CE@csperkins.org> <CE03DB3D7B45C245BCA0D243277949362F596DBC@MX307CL04.corp.emc.com> <E16BEA87-1D0F-48F1-A9AC-2729079D581D@tik.ee.ethz.ch> <8C16F1C6-B4A7-4BB4-B215-D7E7EAF308F8@erg.abdn.ac.uk> <CE03DB3D7B45C245BCA0D243277949362F59C41D@MX307CL04.corp.emc.com> <3E053A65-2698-4749-8E3D-E0451DF84011@ifi.uio.no> <BF6B00CC65FD2D45A326E74492B2C19FB76A6433@FR711WXCHMBA05.zeu.alcatel-lucent.com> <32a23d69d22062669f78df806a4eb6b8.squirrel@erg.abdn.ac.uk> <BF6B00CC65FD2D45A326E74492B2C19FB76A659B@FR711WXCHMBA05.zeu.alcatel-lucent.com> <CE03DB3D7B45C245BCA0D24327! 7949362 F5 AEE02@MX307CL04.corp.emc.com> <6E35FB6C-CA98-413C-B7AE-75402A968017@ifi.uio.no> <3FD27BBF-8E2D-4A42-86A0-C4C0692FF8C9@csperkins.org> <A1874131-D163-4740-98B9-61F055230A04@ifi.uio.no> <CE03DB3D7B45C245BCA0D243277949362F5AFAE1@MX307CL04.corp.emc.com> <2E09525C-C1AD-41D1-AE22-865518FA0FBE@csperkins.org>
To: Colin Perkins <csp@csperkins.org>
X-Mailer: Apple Mail (2.3124)
X-UiO-SPF-Received:
X-UiO-Ratelimit-Test: rcpts/h 9 msgs/h 3 sum rcpts/h 11 sum msgs/h 5 total rcpts 43797 max rcpts/h 54 ratelimit 0
X-UiO-Spam-info: not spam, SpamAssassin (score=-7.1, required=5.0, autolearn=disabled, RP_MATCHES_RCVD=-2.138, UIO_MAIL_IS_INTERNAL=-5, uiobl=NO, uiouri=NO)
X-UiO-Scanned: 987467064C5EE9518D5076116F92D0BC8280545D
X-UiO-SPAM-Test: remote_host: 193.157.240.251 spam_score: -70 maxlevel 80 minaction 2 bait 0 mail/h: 3 total 117 max/h 6 blacklist 0 greylist 0 ratelimit 0
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/9igLmI8Fjjb2ExYGuxexmHQVAsI>
Cc: "Black, David" <david.black@emc.com>, "rtcweb@ietf.org" <rtcweb@ietf.org>, tsvwg <tsvwg@ietf.org>, IETF AVTCore WG <avt@ietf.org>
Subject: Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2016 08:54:44 -0000

> On 29. jun. 2016, at 00.02, Colin Perkins <csp@csperkins.org> wrote:
> 
> 
>> On 28 Jun 2016, at 02:04, Black, David <david.black@emc.com> wrote:
>> 
>> Trying to shorten up this thread again ...
>> 
>>>>>> I'm not quite sure how to specify "use of ECN as additional evidence" of
>>>>>> "excessive congestion" as drop-equivalence is about the best we have
>>>>>> for current guidance.
>>>>> 
>>>>> I fail to parse that sentence, so maybe I’m getting you wrong, but anyway I
>>>>> wonder: what’s even the point of this?
>>>>> Why even bother considering CE-marks as information for a circuit breaker?
>>>> 
>>>> Because the alternative is that we only break the circuit once the queue has
>>> been driven into overflow, and packets have been lost. We want to avoid that,
>>> since it causes latency, and too much latency is very bad for the user experience.
>>> 
>>> Well - the better way out would be for the application to react. Maybe this is me
>>> misunderstanding the circuit breaker, but I did think it’s more like a last resort…
>>> you just don’t want to be trigger-happy with such a thing?
>> 
>> Well, the RTP circuit breaker draft is not trigger happy - for its congestion circuit
>> breaker to trip, RTP has to be sending at 10x the rate that TCP would send under
>> those conditions, based on the TCP throughput equation.  See:
>> 
>> https://tools.ietf.org/html/draft-ietf-avtcore-rtp-circuit-breakers-16#section-4.3
>> 
>> The issue here is - when calculating the comparable TCP throughput, how are ECN-CE
>> marks used to determine the loss rate input to the TCP throughput equation?  Do
>> ECN-CE marked packets count as having arrived or having been dropped?
> 
> Right - or do they count somewhere between the two.

Let’s see them clearly for what they are.
They mean: the path is *not* broken (they have arrived!), and a probably an AQM mechanism, potentially using a shallow queue, marked them to indicate congestion. I think “somewhere between the two” really doesn’t capture this well.


>> When things are relatively stable and the ECN-CE marks are being used to nudge
>> the sender's rate based on what the network can absorb, whether ECN-CE marks
>> count as losses or not is probably immaterial - the 10x divergence from the TCP
>> throughput equation's rate is not going to arise, and the circuit breaker won't trip.
>> The circuit breaker is only supposed to trip when things are seriously wrong.
> 
> Correct.
> 
>> (1) If the RTP congestion circuit breaker trips based on ECN-CE marks alone,
>> something feels intuitively wrong - how'd we get to RTP running at 10x the
>> comparable TCP sending rate with no losses?  Perhaps the circuit breaker
>> shouldn’t trip on ECN-CE marks alone?
> 
> Shouldn’t the comparable rate to trigger the circuit breaker be 10x that given to a TCP flow subject to the same ECN-CE marking rate? If the TCP treats ECN-CE as equivalent to loss, for congestion response, then the circuit breaker should do so to, etc.

First, TCP shouldn’t (treat ECN-CE as equivalent to loss), and so the circuit breaker shouldn’t.
Second, I guess you’re talking about the equation. Well that goes completely wrong anyway (the derivation assumes packets to be lost, not marked; then again, you’re using loss, not the loss event ratio; then again, you’re close to this with ECN perhaps, using “traditional” ECN receiver behavior).


>> (2) At the other extreme, the congestion circuit breaker clearly has to trip if RTP
>> gets to 10x the comparable TCP sending rate based on losses alone.  This is the
>> baseline for the circuit breaker to provide network protection as intended.
>> 
>> So, going back to Gorry's suggestion to use ECN-CE marks as "additional evidence,"
>> here's a straw proposal to shoot at ... factor in ECN-CE marks as additional losses
>> *only when* losses are already occurring.   

I think this is very reasonable.


>> For example, we could specify that for the RTP congestion circuit breaker to trip, the
>> RTP sending rate has to be:
>> 	- 10x the equivalent TCP sending rate based on counting ECN-CE marked
>> 		packets as lost; AND
>> 	- 3x the equivalent sending rate based on actual drops (i.e., counting
>> 		ECN-CE marked packets as delivered).
>> The "3x" above is an off-the-top-of-my-head factor that attempts to roughly
>> equally weight the inputs (3 is close to the square root of 10) - pick a different
>> number if that weighting feels wrong.
>> 
>> This would force drops to occur and then consider ECN-CE marks as additional evidence
>> that something is wrong in the network.
>> 
>> Another possible rationale for this mixing is that if drops start occurring, then many of
>> the new and proposed uses of ECN that treat ECN-CE marks as less than loss-equivalent
>> are outside their intended operating envelopes/regions.
> 
> Clearly if the queue has been driven to overflow, so that packet loss is occurring, then the AQM is outside its intended operating regime. I’m not sure we need to push it so far, though. Is there not a regime where the ECN-CE marking rate indicates excessive congestion, before the queue overflows and drops packets? 

Shouldn’t a congestion control mechanism react well before that?

Cheers,
Michael