Re: [codec] Thresholds and delay.

stephen botzko <> Tue, 11 May 2010 18:22 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 5916B28C1DA for <>; Tue, 11 May 2010 11:22:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.571
X-Spam-Status: No, score=-0.571 tagged_above=-999 required=5 tests=[AWL=-0.573, BAYES_50=0.001, HTML_MESSAGE=0.001]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 0kxgbFQcxcxD for <>; Tue, 11 May 2010 11:22:53 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 3153128C256 for <>; Tue, 11 May 2010 11:19:37 -0700 (PDT)
Received: by wyb42 with SMTP id 42so320959wyb.31 for <>; Tue, 11 May 2010 11:19:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=S44Uq4nVwkjh7lRYsBamntW1NED3r8vO7DjTHebJLFI=; b=FggGgo7bw1u6W9PRkdEQQ34L18eR7FCyrA/hY1S5OaVD/NPCZG3Z6WhS2k9LDICyAV u8uwYISMNYAhB3QQsdJVoZUGfCgHOyqnfXhfkx4h4k123HXVSTeAwJVOvQ2vEs2JYD5M U0h/7/F5UYa1ihuts/itm3gk+pCme/SR7RCm4=
DomainKey-Signature: a=rsa-sha1; c=nofws;; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=IQCjwOHa8qnZC9RXjRBef9LmrMmczhKFteD4hf77jfrJhT8nfamwAqDXBq3Z/RnmsZ 5DCF2DPRoUJP0/LcwFG6A6W3SiJfBQb2rJQnHVYqamvgNi5ignz3795wlDcxEU85NDT+ H6ETv7+KB+zekD2+oOGk5PeP+U89Oe+pvL+zw=
MIME-Version: 1.0
Received: by with SMTP id j14mr5632310wbt.18.1273601962837; Tue, 11 May 2010 11:19:22 -0700 (PDT)
Received: by with HTTP; Tue, 11 May 2010 11:19:22 -0700 (PDT)
In-Reply-To: <1273601174.1684.79.camel@dell-desktop>
References: <> <002c01cae939$5c01f400$1405dc00$@de> <> <009901caede1$43f366d0$cbda3470$@de> <> <> <006101caf117$aaf3b2c0$00db1840$@de> <1273595415.1684.33.camel@dell-desktop> <> <1273601174.1684.79.camel@dell-desktop>
Date: Tue, 11 May 2010 14:19:22 -0400
Message-ID: <>
From: stephen botzko <>
To: Ben Schwartz <>
Content-Type: multipart/alternative; boundary="0016368330b29486fd048655904d"
Subject: Re: [codec] Thresholds and delay.
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 11 May 2010 18:22:58 -0000

In the presence of echo, round-trip delay must be kept below 30 ms to
ensure that the echo is perceived as sidetone, according to the Springer
handbook of speech processing:
Though true, I don't think this is a mainstream consideration.

VOIP phones that are capable of speakerphone operation all have acoustic
echo cancelers, and those cancelers are already tuned to deal with internet
delays with other voice codecs.  Certainly our phones and videoconferencing
systems do not have problems with path delays of this order (hundreds of

>From my own experience (not testing) I agree with Brian's claim that 500 ms
round trip is acceptable for most conversation.

It does depend on what you are doing, and there are certainly tasks where
much lower delays are needed.

Stephen Botzko

On Tue, May 11, 2010 at 2:06 PM, Ben Schwartz <>wrote:

> On Tue, 2010-05-11 at 12:48 -0400, Marshall Eubanks wrote:
> > As a point of order, I object to any graphs without an available paper
> > behind them.
> I have located the first paper mentioned by Christian Hoene at
> but of course it's paywalled.
> One test in that paper told trained subjects to "Take turns reading
> random numbers aloud as fast as possible", on a pair of handsets with
> narrowband uncompressed audio and no echo.  Subjects were able to detect
> round-trip delays down to 90 ms.  Conversational efficiency was impaired
> even with round-trip delay of 100 ms.
> Let me emphasize again that these delays are round-trip, not one-way,
> there is no echo, and the task, while designed to expose latency, is
> probably less demanding than musical performance.
> In the presence of echo, round-trip delay must be kept below 30 ms to
> ensure that the echo is perceived as sidetone, according to the Springer
> handbook of speech processing:
> (
> )
> Such low delays are clearly impossible on many paths, but for Boston to
> New York City (or London to Paris), ping times can be less than 18 ms,
> making echo->sidetone conversion just barely possible for a codec with
> 5ms frames.
> I accept Brian Rosen's claim that a slow conversation doesn't normally
> suffer greatly from round-trip latencies up to 500 ms, but under some
> circumstances much lower latencies are valuable.  Let's make sure
> they're achievable for those who can use them.
> --Ben
> _______________________________________________
> codec mailing list