Re: [codec] #6: Echo and Ultra-low Delay

"codec issue tracker" <trac@tools.ietf.org> Mon, 24 May 2010 14:18 UTC

Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 38C3E3A6BF8 for <codec@core3.amsl.com>; Mon, 24 May 2010 07:18:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.051
X-Spam-Level:
X-Spam-Status: No, score=-101.051 tagged_above=-999 required=5 tests=[AWL=-1.051, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id edWg9XHvCpCA for <codec@core3.amsl.com>; Mon, 24 May 2010 07:18:00 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id EB4C23A6E9E for <codec@ietf.org>; Mon, 24 May 2010 07:17:57 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.71) (envelope-from <trac@tools.ietf.org>) id 1OGYTW-0005FZ-Hc; Mon, 24 May 2010 07:17:50 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.7
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.7, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Mon, 24 May 2010 14:17:50 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/6#comment:2
Message-ID: <071.573763b5e2835137f48b7677222d17a9@tools.ietf.org>
References: <062.f22210f0e625bdcff3a3a9cbfacf433e@tools.ietf.org>
X-Trac-Ticket-ID: 6
In-Reply-To: <062.f22210f0e625bdcff3a3a9cbfacf433e@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #6: Echo and Ultra-low Delay
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 24 May 2010 14:18:02 -0000

#6: Echo and Ultra-low Delay
------------------------------------+---------------------------------------
 Reporter:  hoene@…                 |        Owner:        
     Type:  defect                  |       Status:  closed
 Priority:  major                   |    Milestone:        
Component:  requirements            |      Version:        
 Severity:  Active WG Document      |   Resolution:  fixed 
 Keywords:                          |  
------------------------------------+---------------------------------------
Changes (by hoene@…):

  * status:  new => closed
  * resolution:  => fixed


Comment:

 [Stephen]:
 VOIP phones that are capable of speakerphone operation all have acoustic
 echo cancelers, and those cancelers are already tuned to deal with
 internet delays with other voice codecs.  Certainly our phones and
 videoconferencing systems do not have problems with path delays of this
 order (hundreds of milliseconds).

 [Benjamin]:
 In my experience, the biggest problem is in complex conferences with many
 users at different latencies, and endpoints consisting of multiple-
 microphone multiple-speaker systems scattered across a room and frequently
 repositioned.

 Anyway, I don't think it's very contentious that
 (a) on high latency links (e.g. long distances), AEC will always be needed
 in speakerphones,
 (b) the perceived audio quality in the presence of AEC is improved by very
 low round-trip acoustic latency, and
 (c) AEC may be retuned at very low round-trip acoustic latency for further
 improvement in perceived audio quality.

 Very high audio quality may not yet be a major concern in this market, but
 I think this working group is going to change that.

 [Jari]:
 Generally I believe that AECs mainly take care of the acoustic echo
 generated in the phone itself (operating on the microphone signal,
 acoustic delay up to a few ms). Do you mean that there is additional
 processing on the receiving side for the echo returning from the B user
 side?

 [Stephen]:
 Acoustic echo cancelers remove the signal from the local speaker(s) that
 is feeding back into the local microphone(s).  So if my AEC is turned off,
 then you will hear the echo (and I will not).

 The echo that you hear is delayed by the round trip transport time, the
 processing delays in the sender and receiver, and the acoustic path delay
 between the speaker and the microphone.  However, the echo that the AEC
 detects is only delayed the acoustic path delay and whatever processing
 delay (buffering) the receiver has on its audio capture and render
 circuitry.

 So you are correct if you are thinking that the network transport time and
 codec delay do not impact the requirements for the AEC.

 There can also be path delay between the receiver output and the actual
 speaker.  For instance in a video conferencing application, the video
 processing in the TV itself can add delay, and the TV will delay the audio
 to maintain sync.  So if your phone system can be connected to external
 sound systems, then that can effect the delays the AEC has to accommodate.

 Also, in larger rooms the acoustic path delay can be substantial since
 sound takes about 100 ms to travel 100 feet  (341 meters per second).

 [Benjamin]:
 "Do you mean that there is additional processing on the receiving side for
 the echo returning from the B user side?"
 Only in the user's head.  The psychoacoustics of echo are very different
 depending on the echo time.  This means that the perceived echo-canceler
 fidelity depends on the acoustic round-trip delay.  It also means that the
 AEC should be retuned depending on the round-trip delay.

 [Marshall]: It is not clear to me if the CODEC charter includes stereo
 (not typically used in telephony, but reasonably common in video
 conferencing); stereo AEC is tougher than mono and I believe that there is
 some serious IPR in this area.

 [Stephen]:
 I agree with Jari's assessment, though I am not sure exactly what you mean
 by "retuning".  The far-end user will perceive the fed-back echo in
 various ways, and that does depend on the round-trip time.  However, the
 job of the AEC is to remove the echo, so there is no fed-back echo to
 hear.  And the algorithms which remove echo do not depend on the round-
 trip delay to the far end.  At least not the good ones.

 Perhaps more substantively, I do not think this AEC discussion actually
 matters in this WG context.  We are not working on an AEC, we are working
 on an Internet Codec.  Even if (for argumentation purposes) you accept the
 idea that somehow the AEC needs to be tuned to the round-trip delay, the
 round-trip delay varies enormously depending on the connection, and this
 round-trip time in general is not even discoverable (esp. if gateways or
 SBCs are used).  Nothing we are doing in this group will change any of the
 above.

 As far as sidetone goes, I do not understand why that keeps coming up
 either.

 For the speakerphone use case eliminating AECs from both ends requires two
 conditions:
 (a) the round trip delay has to be very low (30 ms or less, from all delay
 sources)
 (b) there has to be sufficient attenuation on the loop.

 The first condition has been brought up repeatedly.  For general internet
 WAN connections it is clearly not met, and it is in fact difficult to meet
 even on more local connections.

 The second condition has not been discussed much.  But if you do have low
 delay but do not have enough attenuation, what you get is feedback, and
 not sidetone.  Since users have control over the speaker volume, the
 second condition (in general) also can not be guaranteed.

 All speakerphones that I know of are either half-duplex or use AECs.  This
 is true even for PSTN phones used in local-loop circuit-switched calls,
 which is the lowest delay telephony connection that there is.  So for
 telephony at least, it is clear that AECs are needed.

 So I do not think continued debate on the value of sidetone is productive.
 I agree with Michael's comment "achieving low enough latencies for
 sidetone perception should not be a goal of the wg"

 CONCENSUS: Achieving low enough latencies for sidetone perception is out
 of scope.

-- 
Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/6#comment:2>
codec <http://tools.ietf.org/codec/>