[clue] Definition and examples: rendering and rendering type

Christer Holmberg <christer.holmberg@ericsson.com> Mon, 20 June 2011 05:24 UTC

Return-Path: <christer.holmberg@ericsson.com>
X-Original-To: clue@ietfa.amsl.com
Delivered-To: clue@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ED84C11E8145 for <clue@ietfa.amsl.com>; Sun, 19 Jun 2011 22:24:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.033
X-Spam-Level:
X-Spam-Status: No, score=-4.033 tagged_above=-999 required=5 tests=[AWL=-2.435, BAYES_00=-2.599, GB_SUMOF=5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id huPMyPQBB4TZ for <clue@ietfa.amsl.com>; Sun, 19 Jun 2011 22:24:01 -0700 (PDT)
Received: from mailgw10.se.ericsson.net (mailgw10.se.ericsson.net [193.180.251.61]) by ietfa.amsl.com (Postfix) with ESMTP id AAEE411E807B for <clue@ietf.org>; Sun, 19 Jun 2011 22:24:00 -0700 (PDT)
X-AuditID: c1b4fb3d-b7c17ae00000262e-12-4dfed96fdcaa
Received: from esessmw0237.eemea.ericsson.se (Unknown_Domain [153.88.253.124]) by mailgw10.se.ericsson.net (Symantec Mail Security) with SMTP id AF.A3.09774.F69DEFD4; Mon, 20 Jun 2011 07:23:59 +0200 (CEST)
Received: from ESESSCMS0356.eemea.ericsson.se ([169.254.2.138]) by esessmw0237.eemea.ericsson.se ([153.88.115.90]) with mapi; Mon, 20 Jun 2011 07:23:58 +0200
From: Christer Holmberg <christer.holmberg@ericsson.com>
To: "clue@ietf.org" <clue@ietf.org>
Date: Mon, 20 Jun 2011 07:23:57 +0200
Thread-Topic: Definition and examples: rendering and rendering type
Thread-Index: AcwvCkDBcBHcAt0cQwi2VF2o5xTOpQ==
Message-ID: <7F2072F1E0DE894DA4B517B93C6A05851DB5336404@ESESSCMS0356.eemea.ericsson.se>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_7F2072F1E0DE894DA4B517B93C6A05851DB5336404ESESSCMS0356e_"
MIME-Version: 1.0
X-Brightmail-Tracker: AAAAAA==
Subject: [clue] Definition and examples: rendering and rendering type
X-BeenThere: clue@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: CLUE - ControLling mUltiple streams for TElepresence <clue.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/clue>, <mailto:clue-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/clue>
List-Post: <mailto:clue@ietf.org>
List-Help: <mailto:clue-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/clue>, <mailto:clue-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 20 Jun 2011 05:24:02 -0000

Hi,

There have been some questions on what we mean by "rendering" and "rendering type".

Together with my collegeus, I've tried to put together some simple definition text, and some examples, that I hope will clarify.


DEFINITONS
----------

Rendering:   The procedure when an entity uses one or more input media signals to generate an output media signal, using a specific algorithm.

Rendering Type:   A unique tag associated with a specific algorithm used to perform central rendering.




Audio rendering algorithms
--------------------------

An audio rendering algorithm is a linear transformation that transforms multiple input audio signals into a multi-channel output audio signal. There are three main types of linear rendering transformations, plain, panning and 3D binaural.

Video rendering algorithms
--------------------------

A video rendering algorithm is a linear transformation that transforms multiple input video signals into an output video signal. Similar linear rendering transformations as for audio can be applied. Panning and 3D rendering transformations are the most useful for video.

Non-linear rendering
--------------------

For both audio and video, the linear algorithms can be combined with or complemented by non-linear aspects such as e.g. thresholding that excludes "weak" (less active) signals from being included in the render. It is also possible to let some controller (user, conference owner, ...) impact the rendering algorithm in a multitude of ways, e.g. controlling the inclusion of inputs into the rendering.

It would also be possible to describe the rendering algorithms as "linear or non-linear" and include the aspects above in the algorithms themselves.


ALGORITHM EXAMPLES
------------------

Example 1: Plain mixing

The simplest audio transformations are the plain mixing transformations. In the case of a mono output signal and mono and stereo input signals, the plain mixing transformation is simply the sum of all the input signal channels. Similarly, in the case of stereo output and mono and stereo input signals, the left output signal is the sum of all the left input signals and scaled mono signals  and the right output signal is the sum of all the right input signals and scaled mono signals.

For video, it would also be possible to do plain mixing but that would create an output that superimposes the inputs, which is not a particularly useful way of presenting multiple video media.

Example 2: Panning

The algorithms to perform audio panning and 3D binaural transformations are much more sophisticated transformations than plain mixing. They try to position each of the input signal channels, according to some positional layout, in the rendered audio space that is generated when the multi-channel ouput signal is played through a particular loudspeaker configuration.

Video panning is the most common rendering transformation. Input signals are mapped onto a 2D rendering layout where individual input signals can be represented in the output signal having different positions and sizes. The video version of 3D binaural audio rendering could be a 3D video rendering that maps the input signal into a 3D rendering layout. In that, the input video signals could themselves be either 2D or 3D.

Example 3: Thresholding

For audio, input thresholding can exclude weak audio signals from a mix or pan and by that reduce the resulting output noise level. It can also limit the number of allowed input sources such as to mix only the N most important onto the output. What inputs are most important can vary, but is typically chosen to be the "most active" where the algorithm for the activity measure can also vary. That measure will likely not have to be standardized.

For video, it is similarly straightforward to limit the number of input signals, N, that are included in the output mix or pan. When N=1, it becomes a simple switching of input source. Input signal importance is typically taken from the importance of the related audio stream; "speaker activity"-based switching.


Regards,

Christer