Re: [codec] #3: 2.2. Conferencing: Support of binaural audio?
"codec issue tracker" <trac@tools.ietf.org> Sat, 01 May 2010 10:44 UTC
Return-Path: <trac@tools.ietf.org>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C79E73A6AD2 for <codec@core3.amsl.com>; Sat, 1 May 2010 03:44:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.206
X-Spam-Level:
X-Spam-Status: No, score=-101.206 tagged_above=-999 required=5 tests=[AWL=-1.206, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8K3JXLZG5-di for <codec@core3.amsl.com>; Sat, 1 May 2010 03:44:33 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (unknown [IPv6:2001:1890:1112:1::2a]) by core3.amsl.com (Postfix) with ESMTP id D78E73A67F6 for <codec@ietf.org>; Sat, 1 May 2010 03:44:33 -0700 (PDT)
Received: from localhost ([::1] helo=zinfandel.tools.ietf.org) by zinfandel.tools.ietf.org with esmtp (Exim 4.69) (envelope-from <trac@tools.ietf.org>) id 1O8ABH-00021R-69; Sat, 01 May 2010 03:44:19 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <trac@tools.ietf.org>
X-Trac-Version: 0.11.6
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.6, by Edgewall Software
To: hoene@uni-tuebingen.de
X-Trac-Project: codec
Date: Sat, 01 May 2010 10:44:19 -0000
X-URL: http://tools.ietf.org/codec/
X-Trac-Ticket-URL: http://trac.tools.ietf.org/wg/codec/trac/ticket/3#comment:1
Message-ID: <071.5c139aff3b600414066c330b20c0e191@tools.ietf.org>
References: <062.a837f2ff7647f7cb184f0c86b7e65747@tools.ietf.org>
X-Trac-Ticket-ID: 3
In-Reply-To: <062.a837f2ff7647f7cb184f0c86b7e65747@tools.ietf.org>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Rcpt-To: hoene@uni-tuebingen.de, codec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
Cc: codec@ietf.org
Subject: Re: [codec] #3: 2.2. Conferencing: Support of binaural audio?
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Reply-To: codec@ietf.org
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 01 May 2010 10:44:34 -0000
#3: 2.2. Conferencing: Support of binaural audio?
------------------------------------+---------------------------------------
Reporter: hoene@… | Owner:
Type: enhancement | Status: new
Priority: major | Milestone:
Component: requirements | Version:
Severity: - | Keywords:
------------------------------------+---------------------------------------
Comment(by hoene@…):
[Hoene]:
I am trying to compile the different requirements that have been mentioned
on this list.
- low complexity (with just one active speaker) vs. multiple speaker
mixing vs. spatial audio/stereo mixing
- centralized vs. distributed
- few participants vs. hundreds of listeners and talkers
- individual distribution of audio streams vs. IP multicast or RTP
group communication
- efficient encoding of multiple streams having the same content
(but different quality).
To make things easier, why not to split the teleconferencing scenario in
two: High quality and Scalable?
The high quality scenario, intended for a low number of users, could have
features like
- Distributed processing and mixing
- High computational resources to support spatial audio mixing (at
the receiver) and multiple encodings of the same audio stream at different
qualities (at the sender)
- Enough bandwidth to allow direct N to N transmissions of audio
streams (no multicast or group communication). This would be good for the
latency, too.
The scalable scenario is the opposite:
- Central processing and mixing for many participants .
- N to 1 and 1 to N communication using efficient distribution
mechanisms (RTP group communication and IP multicast).
- Low complexity mixing of many using tricks like VAD, encoding at
lowest rate to support many receivers having different paths, you name
it...
High quality:
- Quite the same requirement as an end-to-end audio transmission:
high quality and low latency.
- Maybe additionally: variable bit rate encoding to achieve a
multiplexing gain at the receiver
- and thus, a fast control loop to cope with variable bitrates on
transmission paths.
- Maybe stereo/multichannel support to send the spatial audio to the
headphone or loudspeakers.
Scalable:
- Efficient encoding/transcoding for multiple different qualities
(at the conference bridge)
- The control loop must not react (fast) because (multicast) group
communication requires to encode at low quality anyhow.
- Receiver side activity detection for music and voice having low
complexity (for the conference bridge)
- Efficient mixing of two to four(?) active flows (is this
achievable without the complete process of decoding and encoding again?)
[Raymond]: High quality is a given, but I would like to emphasize the
importance of low latency.
(1) It is well-known that the longer the latency, the lower the perceived
quality of the communication link. [...]
(2) The lower the latency, the less audible the echo, and thus the lower
the required echo return loss. Hence, lower latency means easier echo
control and simpler echo canceller, and as people already mentioned
previously, below a certain delay, an echo is simply perceived as a
harmless side-tone and no echo canceller is needed. It seems to me that
echo control in conference calls is more difficult than in point-to-point
calls. While I hardly ever heard echoes in domestic point-to-point calls,
in my experience with conference calls at work, even with the G.711 codec
(which has almost no delay), sometimes I still hear echoes (I just heard
another one this afternoon). If a relatively long-delay IETF codec is
used, the echo control will be even more problematic.
(3) In normal phone calls or conference calls, people routinely have a
need to interrupt each other, but beyond a certain point, long latency
makes it very difficult for people to interrupt each other on the call.
This is because when you try to interrupt another person, that person
doesn’t hear your interruption until a certain time later, so he keeps
talking, but when you hear that he did not stop talking when you
interrupted, you stop; then, he hears your interruption, so he stops. When
you hear he stops, you start talking again, but then he also hears you
stopped (due to the long delay), so he also starts talking again. The net
result is that with a long latency, when you try to interrupt him, you and
he end up stopping and starting at roughly the same time for a few cycles,
making it difficult to interrupt each other.
[Jean-Marc:]
The decoder complexity is very important. Not only because of mixing
issue, but also because the decoder is generally not allowed to take
shortcuts to save on complexity (unlike the encoder). As for compressed-
domain mixing, as you say it is not always available, but *if* we can do
it (even if only partially), then that can result in a "free" reduction in
decoder complexity for mixing.
--
Ticket URL: <http://trac.tools.ietf.org/wg/codec/trac/ticket/3#comment:1>
codec <http://tools.ietf.org/codec/>
- [codec] #3: 2.2. Conferencing: Support of binaura… codec issue tracker
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Slava Borilin
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Christian Hoene
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Michael Knappe
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Marc Petit-Huguenin
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Gregory Maxwell
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Slava Borilin
- Re: [codec] #3: 2.2. Conferencing: Support of bin… Stefan Sayer
- Re: [codec] #3: 2.2. Conferencing: Support of bin… codec issue tracker
- Re: [codec] requirements #3 (new): 2.2. Conferenc… codec issue tracker
- Re: [codec] #3: 2.2. Conferencing: Support of bin… codec issue tracker