Re: [AVTCORE] RTP circuit breakers practical experience feedback

Colin Perkins <csp@csperkins.org> Mon, 10 August 2015 21:32 UTC

Return-Path: <csp@csperkins.org>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BD4361B3EF3 for <avt@ietfa.amsl.com>; Mon, 10 Aug 2015 14:32:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id f64Np9dRwC-h for <avt@ietfa.amsl.com>; Mon, 10 Aug 2015 14:32:36 -0700 (PDT)
Received: from balrog.mythic-beasts.com (balrog.mythic-beasts.com [IPv6:2a00:1098:0:82:1000:0:2:1]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2630C1B3EED for <avt@ietf.org>; Mon, 10 Aug 2015 14:32:35 -0700 (PDT)
Received: from [81.187.2.149] (port=40184 helo=[192.168.0.21]) by balrog.mythic-beasts.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <csp@csperkins.org>) id 1ZOugK-0001zP-TS; Mon, 10 Aug 2015 22:32:33 +0100
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Colin Perkins <csp@csperkins.org>
In-Reply-To: <55C4C0B0.1080100@jive.com>
Date: Mon, 10 Aug 2015 22:32:46 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <2838C1DA-1418-4B1B-BA6A-8CB542B2DD64@csperkins.org>
References: <55C4C0B0.1080100@jive.com>
To: Simon Perreault <sperreault@jive.com>
X-Mailer: Apple Mail (2.1878.6)
X-BlackCat-Spam-Score: -28
X-Mythic-Debug: Threshold = On =
Archived-At: <http://mailarchive.ietf.org/arch/msg/avt/X3ZPRqAqyx7RM94dlwiUw00dhco>
Cc: "avt@ietf.org" <avt@ietf.org>
Subject: Re: [AVTCORE] RTP circuit breakers practical experience feedback
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Aug 2015 21:32:37 -0000

Simon, 

Many thanks for implementing this, and reporting back to the group.

On 7 Aug 2015, at 15:29, Simon Perreault <sperreault@jive.com> wrote:
> All,
> 
> We implemented draft-ietf-avtcore-rtp-circuit-breakers-10 and put it in
> production here at Jive. It is exposed to all kinds of traffic:
> intra-DC, inter-DC, SIP peer to SIP peer, and wild public Internet
> traffic. I obviously can't discuss traffic volume. Here are some lessons
> learned...
> 
> - The "media timeout" circuit breaker is unusable in practice because at
> least one popular B2BUA implementation we routinely talk to (not going
> to name names) lies in its RTCP reports. In one particular instance, it
> sends RTCP with non-increasing maximum sequence numbers even though it
> is receiving the media we send it. We have found no way to work around
> this in general apart from fingerprinting the implementation of the
> remote endpoint (which we are *not* considering, for obvious reasons).
> 
> - The "RTCP timeout" circuit breaker is unusable in practice because it
> is common to stop receiving RTCP when an RTP relay we are sending to
> changes its destination from an RTCP-sending receiver to a
> non-RTCP-sending receiver. There is no indication at the signalling
> level that this is happening because what happens beyond the RTP relay
> is invisible to us. We have found no way to work around this in general.

These are clearly bugs in the devices you're communicating with. I agree that if such devices are common in some environments, then an RTP circuit breaker that checks connectivity using RTCP will be difficult to deploy in those environments. I'm not sure there's much we can do about this, other than report the bugs and hope the broken devices eventually get fixed. I will make sure the circuit breaker draft notes the issue. 

> - The "congestion" circuit breaker has never triggered. I don't really
> know how to interpret this: am I supposed to rejoice or is the circuit
> breaker simply useless in practice?

It perhaps addresses the concern that the circuit breaker is overly sensitive... 

(If you have data you're able to share privately, I'd be interested in learning more about the performance you're seeing, and how the circuit breaker is behaving in your deployment.)

> In addition, the text describing the algorithm leaves much to the
> programmer's interpretation. I've discussed this with the authors and I
> expect a revision with tighter text.

We're working on this.

> - We have not implemented the "media usability" circuit breaker as we
> were unable to find a good practical criteria for usability. I'd be
> surprised if anyone else did.

I think it depends very much on the application. If I were implementing an interactive video conferencing tool, I might treat a series of consecutive RTCP reports showing an RTT in excess of a couple of seconds as not usable media, for example. It's a deliberately vague fallback: if what you're receiving is clearly unusable for your application, consider stopping sending.

Cheers,
Colin






-- 
Colin Perkins
https://csperkins.org/