Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16

"Black, David" <david.black@emc.com> Wed, 29 June 2016 14:44 UTC

Return-Path: <david.black@emc.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CD5E612D192; Wed, 29 Jun 2016 07:44:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.747
X-Spam-Level:
X-Spam-Status: No, score=-5.747 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=emc.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bD8v5AFiMknX; Wed, 29 Jun 2016 07:44:33 -0700 (PDT)
Received: from mailuogwhop.emc.com (mailuogwhop.emc.com [168.159.213.141]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 37A7E12B069; Wed, 29 Jun 2016 07:44:33 -0700 (PDT)
Received: from maildlpprd05.lss.emc.com (maildlpprd05.lss.emc.com [10.253.24.37]) by mailuogwprd01.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id u5TEiPHj006676 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Wed, 29 Jun 2016 10:44:25 -0400
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd01.lss.emc.com u5TEiPHj006676
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=emc.com; s=jan2013; t=1467211466; bh=kSkJdhAqpefg+rp93A8/ncTCzz8=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:Content-Transfer-Encoding:MIME-Version; b=lSlZ4owL5+qyQq8sxDeCrzLKDDW5LWAVBVwHz0EvgoAEYwgN0/KSmOtrF1IdvjiCR fEgPvt0fR3/LXuKuMnhUufGVsdCV+L9AAytgIU3VW6Qaf87L/BrIiquhXwpUW6q4qB Dl/9KFDlSqsYUXByrLRDAsFobce4cE8ZLY967nns=
X-DKIM: OpenDKIM Filter v2.4.3 mailuogwprd01.lss.emc.com u5TEiPHj006676
Received: from mailusrhubprd03.lss.emc.com (mailusrhubprd03.lss.emc.com [10.253.24.21]) by maildlpprd05.lss.emc.com (RSA Interceptor); Wed, 29 Jun 2016 10:43:36 -0400
Received: from MXHUB308.corp.emc.com (MXHUB308.corp.emc.com [10.146.3.34]) by mailusrhubprd03.lss.emc.com (Sentrion-MTA-4.3.1/Sentrion-MTA-4.3.0) with ESMTP id u5TEiAqD019619 (version=TLSv1.2 cipher=AES128-SHA256 bits=128 verify=FAIL); Wed, 29 Jun 2016 10:44:11 -0400
Received: from MX307CL04.corp.emc.com ([fe80::849f:5da2:11b:4385]) by MXHUB308.corp.emc.com ([10.146.3.34]) with mapi id 14.03.0266.001; Wed, 29 Jun 2016 10:44:10 -0400
From: "Black, David" <david.black@emc.com>
To: Michael Welzl <michawe@ifi.uio.no>, Colin Perkins <csp@csperkins.org>
Thread-Topic: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16
Thread-Index: AQHR0L+ZFj+KetGouEG7mSRAG7kvVJ/+KEcA///WaKCAAbR8AIAAthgAgAAb1mA=
Date: Wed, 29 Jun 2016 14:44:09 +0000
Message-ID: <CE03DB3D7B45C245BCA0D243277949362F5B4628@MX307CL04.corp.emc.com>
References: <ccf9f2d7-2694-4336-0ec9-ccfebfeb0120@ericsson.com> <CE03DB3D7B45C245BCA0D243277949362F585D3E@MX307CL04.corp.emc.com> <d97e30a7-70f5-26d0-c3a4-0497c669f5f6@ericsson.com> <CE03DB3D7B45C245BCA0D243277949362F586054@MX307CL04.corp.emc.com> <D19E595F-7C66-4AE9-92B4-D550A93F634D@csperkins.org> <CE03DB3D7B45C245BCA0D243277949362F589335@MX307CL04.corp.emc.com> <20160616222548.GB77166@verdi> <0643E158-BF26-4692-8167-B7A959CB20CE@csperkins.org> <CE03DB3D7B45C245BCA0D243277949362F596DBC@MX307CL04.corp.emc.com> <E16BEA87-1D0F-48F1-A9AC-2729079D581D@tik.ee.ethz.ch> <8C16F1C6-B4A7-4BB4-B215-D7E7EAF308F8@erg.abdn.ac.uk> <CE03DB3D7B45C245BCA0D243277949362F59C41D@MX307CL04.corp.emc.com> <3E053A65-2698-4749-8E3D-E0451DF84011@ifi.uio.no> <BF6B00CC65FD2D45A326E74492B2C19FB76A6433@FR711WXCHMBA05.zeu.alcatel-lucent.com> <32a23d69d22062669f78df806a4eb6b8.squirrel@erg.abdn.ac.uk> <BF6B00CC65FD2D45A326E74492B2C19FB76A659B@FR711WXCHMBA05.zeu.alcatel-lucent.com> <CE03DB3D7B45C245BCA0D24327! ! 7949362F5 AEE02@MX307CL04.corp.emc.com> <6E35FB6C-CA98-413C-B7AE-75402A968017@ifi.uio.no> <3FD27BBF-8E2D-4A42-86A0-C4C0692FF8C9@csperkins.org> <A1874131-D163-4740-98B9-61F055230A04@ifi.uio.no> <CE03DB3D7B45C245BCA0D243277949362F5AFAE1@MX307CL04.corp.emc.com> <2E09525C-C1AD-41D1-AE22-865518FA0FBE@csperkins.org> <DD563445-98AD-43F1-8AB8-3E70FDC8F9F1@ifi.uio.no>
In-Reply-To: <DD563445-98AD-43F1-8AB8-3E70FDC8F9F1@ifi.uio.no>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.238.45.60]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Sentrion-Hostname: mailusrhubprd03.lss.emc.com
X-RSA-Classifications: public
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/xadQSVsnJk11jjhIpspCJI4OiEo>
Cc: "rtcweb@ietf.org" <rtcweb@ietf.org>, tsvwg <tsvwg@ietf.org>, IETF AVTCore WG <avt@ietf.org>
Subject: Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-avtcore-rtp-circuit-breakers-16
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2016 14:44:36 -0000

I wrote: 

>> Another possible rationale for this mixing is that if drops start occurring, then many of
>> the new and proposed uses of ECN that treat ECN-CE marks as less than loss-equivalent
>> are outside their intended operating envelopes/regions.

Colin responded:

> Clearly if the queue has been driven to overflow, so that packet loss is
> occurring, then the AQM is outside its intended operating regime. I’m not sure
> we need to push it so far, though. Is there not a regime where the ECN-CE
> marking rate indicates excessive congestion, before the queue overflows and
> drops packets?

Yes, but that may not be relevant to this discussion.   My hypothesis for this discussion
is that actual drops will occur well before RTP  is running at 10x the TCP equivalent rate
(based on drops and ECN-CE marks), and that 10x factor is the trip threshold for the
circuit breaker, which is the focus of this discussion. 

I would think that the "regime where the ECN-CE marking rate indicates excessive
congestion, before the queue overflows and drops packets?" would not extend as
far as RTP running at 10x the TCP equivalent rate, which is where the RTP circuit
breaker trips.

This is all a thought exercise - I'm happy to be shown to be wrong based on actual
data and experience with "running code" ...

To which Michael responded: 

> Shouldn’t a congestion control mechanism react well before that?

Yes, if there is one ;-).  This is RTP ...

Thanks, --David

> -----Original Message-----
> From: Michael Welzl [mailto:michawe@ifi.uio.no]
> Sent: Wednesday, June 29, 2016 4:55 AM
> To: Colin Perkins
> Cc: Black, David; rtcweb@ietf.org; tsvwg; IETF AVTCore WG
> Subject: Re: [AVTCORE] [rtcweb] [tsvwg] WG Last Call on changes: draft-ietf-
> avtcore-rtp-circuit-breakers-16
> 
> 
> > On 29. jun. 2016, at 00.02, Colin Perkins <csp@csperkins.org>; wrote:
> >
> >
> >> On 28 Jun 2016, at 02:04, Black, David <david.black@emc.com>; wrote:
> >>
> >> Trying to shorten up this thread again ...
> >>
> >>>>>> I'm not quite sure how to specify "use of ECN as additional evidence" of
> >>>>>> "excessive congestion" as drop-equivalence is about the best we have
> >>>>>> for current guidance.
> >>>>>
> >>>>> I fail to parse that sentence, so maybe I’m getting you wrong, but anyway
> I
> >>>>> wonder: what’s even the point of this?
> >>>>> Why even bother considering CE-marks as information for a circuit
> breaker?
> >>>>
> >>>> Because the alternative is that we only break the circuit once the queue has
> >>> been driven into overflow, and packets have been lost. We want to avoid
> that,
> >>> since it causes latency, and too much latency is very bad for the user
> experience.
> >>>
> >>> Well - the better way out would be for the application to react. Maybe this is
> me
> >>> misunderstanding the circuit breaker, but I did think it’s more like a last
> resort…
> >>> you just don’t want to be trigger-happy with such a thing?
> >>
> >> Well, the RTP circuit breaker draft is not trigger happy - for its congestion
> circuit
> >> breaker to trip, RTP has to be sending at 10x the rate that TCP would send
> under
> >> those conditions, based on the TCP throughput equation.  See:
> >>
> >> https://tools.ietf.org/html/draft-ietf-avtcore-rtp-circuit-breakers-16#section-
> 4.3
> >>
> >> The issue here is - when calculating the comparable TCP throughput, how are
> ECN-CE
> >> marks used to determine the loss rate input to the TCP throughput equation?
> Do
> >> ECN-CE marked packets count as having arrived or having been dropped?
> >
> > Right - or do they count somewhere between the two.
> 
> Let’s see them clearly for what they are.
> They mean: the path is *not* broken (they have arrived!), and a probably an
> AQM mechanism, potentially using a shallow queue, marked them to indicate
> congestion. I think “somewhere between the two” really doesn’t capture this
> well.
> 
> 
> >> When things are relatively stable and the ECN-CE marks are being used to
> nudge
> >> the sender's rate based on what the network can absorb, whether ECN-CE
> marks
> >> count as losses or not is probably immaterial - the 10x divergence from the
> TCP
> >> throughput equation's rate is not going to arise, and the circuit breaker won't
> trip.
> >> The circuit breaker is only supposed to trip when things are seriously wrong.
> >
> > Correct.
> >
> >> (1) If the RTP congestion circuit breaker trips based on ECN-CE marks alone,
> >> something feels intuitively wrong - how'd we get to RTP running at 10x the
> >> comparable TCP sending rate with no losses?  Perhaps the circuit breaker
> >> shouldn’t trip on ECN-CE marks alone?
> >
> > Shouldn’t the comparable rate to trigger the circuit breaker be 10x that given to
> a TCP flow subject to the same ECN-CE marking rate? If the TCP treats ECN-CE as
> equivalent to loss, for congestion response, then the circuit breaker should do so
> to, etc.
> 
> First, TCP shouldn’t (treat ECN-CE as equivalent to loss), and so the circuit breaker
> shouldn’t.
> Second, I guess you’re talking about the equation. Well that goes completely
> wrong anyway (the derivation assumes packets to be lost, not marked; then
> again, you’re using loss, not the loss event ratio; then again, you’re close to this
> with ECN perhaps, using “traditional” ECN receiver behavior).
> 
> 
> >> (2) At the other extreme, the congestion circuit breaker clearly has to trip if
> RTP
> >> gets to 10x the comparable TCP sending rate based on losses alone.  This is the
> >> baseline for the circuit breaker to provide network protection as intended.
> >>
> >> So, going back to Gorry's suggestion to use ECN-CE marks as "additional
> evidence,"
> >> here's a straw proposal to shoot at ... factor in ECN-CE marks as additional
> losses
> >> *only when* losses are already occurring.
> 
> I think this is very reasonable.
> 
> 
> >> For example, we could specify that for the RTP congestion circuit breaker to
> trip, the
> >> RTP sending rate has to be:
> >> 	- 10x the equivalent TCP sending rate based on counting ECN-CE marked
> >> 		packets as lost; AND
> >> 	- 3x the equivalent sending rate based on actual drops (i.e., counting
> >> 		ECN-CE marked packets as delivered).
> >> The "3x" above is an off-the-top-of-my-head factor that attempts to roughly
> >> equally weight the inputs (3 is close to the square root of 10) - pick a different
> >> number if that weighting feels wrong.
> >>
> >> This would force drops to occur and then consider ECN-CE marks as additional
> evidence
> >> that something is wrong in the network.
> >>
> >> Another possible rationale for this mixing is that if drops start occurring, then
> many of
> >> the new and proposed uses of ECN that treat ECN-CE marks as less than loss-
> equivalent
> >> are outside their intended operating envelopes/regions.
> >
> > Clearly if the queue has been driven to overflow, so that packet loss is
> occurring, then the AQM is outside its intended operating regime. I’m not sure
> we need to push it so far, though. Is there not a regime where the ECN-CE
> marking rate indicates excessive congestion, before the queue overflows and
> drops packets?
> 
> Shouldn’t a congestion control mechanism react well before that?
> 
> Cheers,
> Michael