[rtcweb] RTCWEB / CLUE Use Cases for Browser - Telepresence Interoperability

Marshall Eubanks <marshall.eubanks@gmail.com> Fri, 13 April 2012 18:14 UTC

Return-Path: <marshall.eubanks@gmail.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AEABA11E809B for <rtcweb@ietfa.amsl.com>; Fri, 13 Apr 2012 11:14:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.589
X-Spam-Level:
X-Spam-Status: No, score=-103.589 tagged_above=-999 required=5 tests=[AWL=0.010, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lUcv6WDZQaew for <rtcweb@ietfa.amsl.com>; Fri, 13 Apr 2012 11:14:59 -0700 (PDT)
Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by ietfa.amsl.com (Postfix) with ESMTP id 07BEE11E8093 for <rtcweb@ietf.org>; Fri, 13 Apr 2012 11:14:58 -0700 (PDT)
Received: by lbbgf14 with SMTP id gf14so680957lbb.31 for <rtcweb@ietf.org>; Fri, 13 Apr 2012 11:14:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=Lmmp95/kIjcJ7bDvOTbCwYBS5msJvm5nfOQoTOEBDzg=; b=Nb8SHczeazB8rLONIlNtK29xxNmAeOU0gibMSMxR1dfOcWiaPSUXJ2xyjmzduwsx3M JGOhp1Ur3Ws8GK4VXiVw9tacA4YT3uv1COlhRUsuORudJtckRYkMXv4nXdE8cqDRbAci LEmdaRJb+ERxVk1bFXeOSddki2MoTe8jAxcA6aU00edeB79+S3lYgjnwZS1DLQIGLboW khybnInTsG+v0fGx0DHMxTrxRgrePJy+pZlMkTTwUERqrJVy4KcxrsjYsjixWVE7erQL r9P+xlavg1Alh2yKrWyb21u/Ec8TJdzBEI0uLuYZf6n/FL45uiXYVP1/u+FxXvAiIhE0 Fufg==
MIME-Version: 1.0
Received: by 10.112.30.102 with SMTP id r6mr1222512lbh.30.1334340897882; Fri, 13 Apr 2012 11:14:57 -0700 (PDT)
Received: by 10.112.46.4 with HTTP; Fri, 13 Apr 2012 11:14:57 -0700 (PDT)
Date: Fri, 13 Apr 2012 14:14:57 -0400
Message-ID: <CAJNg7VKEJC-4ob6-q8-8LM8HDM_cm7Z3oinnjc+=1wna4i+CoA@mail.gmail.com>
From: Marshall Eubanks <marshall.eubanks@gmail.com>
To: rtcweb@ietf.org
Content-Type: text/plain; charset="ISO-8859-1"
Subject: [rtcweb] RTCWEB / CLUE Use Cases for Browser - Telepresence Interoperability
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Apr 2012 18:14:59 -0000

I am posting this to both CLUE and RTCWEB. I am not sure which (if
either) WG would want to adopt it, but I think both should see it. If
this gains traction, I
will make sure both WG are kept abreast of developments.

Regards
Marshall

RTCWEB / CLUE Use Cases for Browser - Telepresence Interoperability

References :

RTCWEB Use Cases and Requirements :
http://www.ietf.org/id/draft-ietf-rtcweb-use-cases-and-requirements-07.txt

CLUE :  Use Cases for Telepresence Multi-streams
http://www.ietf.org/id/draft-ietf-clue-telepresence-use-cases-02.txt

What follows is a first-cut at use cases for interoperability between
a browser based videoconferencing system and Telepresence
videoconferencing. This is an attempt to describe what should be done,
without specifying in detail how. I have seen all of the below in the
field, with Cases 1-T, 1-R.c and 2-R.b being the most widely
supported. These cases are similar to the 4.2.10 use case in
ID.draft-ietf-rtcweb-use-cases-and-requirements, but of course in the
telepresence scenario there is a set of CLUE semantics overlaid on the
basic display of screens. In this scenario, the browser is more likely
to require higher level protocol changes than the telepresence units
but, of course, the telepresence units or middleware MUST be able to
accommodate RTCWEB codec and signaling choices.

Base scenario. There are one or more browser based videoconferencing
systems and one or more telepresence systems (each with at least two
cameras and two screens) participating in an immersive-telepresence
videoconferencing session. This may or may not require middleware (a
server or MCU) to accomplish, both use cases with and without
middleware SHOULD be supported. The telepresence units themselves are
assumed to use CLUE and underlying protocols to decide suitable bit
rates, resolutions, etc., between themselves; intra-telepresence
negotiations are not in scope for this text. In telepresence, "screen"
and "stream" are used more or less interchangeably, and will be done
so here. In the base scenario, there are thus at least 3 screens
(streams); conferences with 9 or more streams are not unusual, and
conferences with dozens or even hundreds of streams are a commercial
reality.

It is useful to divide use cases between Transmission (T, what the
browser sends) and Reception (R, what it receives). Browser based
units SHOULD be able to decide between transmission and reception use
cases independently, depending on capabilities and user choices.

Case 1 : The one screen use case.

In Case 1, it is possible for a CLUE Telepresence session and a
browser to interact without any knowledge of CLUE on the part of the
browser (i.e., by a telepresence unit or middleware  acting as a
single point videoconferencing unit). Such "Clueless" conferencing
sessions are not in scope for this text.

Case 1-Transmission : The browser sends one screen at the screen
characteristics of the participating telepresence units or, failing
that, at the maximum resolution and bandwidth it is capable or
authorized to do so. Likewise, audio SHOULD be sent at the bit rate
and using the codec negotiated by the CLUE telepresence session. Case
1 audio SHOULD be sent in mono. The browser MUST be able to use CLUE
to negotiate these characteristics (which may change with time), and
SHOULD be able to provide whatever meta-data is required by CLUE
(e.g., the user's name or location).

The browser may be one complete screen in the remote telepresence
units. If care is taken by the browser user and software (for example,
by matching head size and camera angle compared to the standards of
the participating telepresence units, together with sufficient audio
and video quality and resolution), the browser image and sound may
approach immersive telepresence on the remote ends. (It would be
useful if the system provided feedback or even automated zoom in/out
to help with this adjustment for remote immersion. I have seen this
done manually, but this has not to my knowledge been discussed to date
in CLUE.)

In the case of lower quality browser transmitted video, the receiving
telepresence units may chose to display such videos in a composited
form, with multiple browser transmissions sharing one screen. (This is
common with low quality video from multiple browser-based
participants.)

Case 1-Reception : The browser displays one screen for all of the
remote telepresence units. The browser receives one screen at the
screen characteristics of the participating telepresence units or,
failing that, at the maximum resolution and bandwidth it is capable
of. Likewise, it SHOULD be able to receive audio at the bit rate and
using the codec negotiated by the CLUE telepresence session. The
browser MUST be able to negotiate these characteristics (which may
change with time) using CLUE.

In all Case 1-R sub cases, the browser MAY also display "thumbnail"
(i.e., substantially reduced) images of other screens. These
thumbnails might be shown for all screens, or for recently active
screens (e.g., the last speaker), or for some static choice (e.g., the
conference chair). The browser SHOULD be able to signal its desire /
need to receive such thumbnails.

Sub Case 1-R a :  Static. The browser displays one screen only, as
selected by the user or by some other method. (A simultaneous
translator, for example, may prefer only to see the screen attached to
their assigned speaker, regardless of whether or not they are
speaking.)

Sub Case 1-R b : Switching. The browser displays the "active" stream,
typically of the current speaker. The browser MUST be able to switch
rapidly between resolutions and other stream configuration choices,
say if the active screen switches between a telepresence unit and
another browser's transmission.

Sub Case 1-R c : Compositing. The browser displays one screen,
consisting of a static compositing of all (or conceivably a selection)
of the other screens. This MAY indicate the active speaker (say, by
highlighting them) and MAY display composited metadata (such as the
attendees in each sub-screen, or their location). (If this display is
an NxM matrix of equal size sub-screens, this display type is
frequently called "Hollywood Squares," but other choices are
possible.) In general this will be a static screen assignment, which
MAY change with time (e.g., as participants come and go from the
conference). This compositing could be done by the browser, but in
current practice will most likely be done by either a telepresence
unit or by middleware.

Case 2 : Multiple Screens.  The browser sends and/or receives multiple
screens at the maximum resolution and bandwidth it is capable or
authorized to do so. (This case is NOT intended to cover "thumbnail"
sub screens, which may be sent or received in this case as well.)

Case 2-Transmission. The browser has access to multiple cameras and
sends multiple images to remote participants. If care is taken by the
browser user / software, the browser images may approach immersive
telepresence on the remote ends. For this to happen in the
multi-screen transmission case, the browser MUST be able to fully
participate in CLUE protocol negotiations, identifying, for example,
which screen is left, center and right in the case of a three camera
transmission.

In Case 2-T, the browser SHOULD send stereo or multi-channel audio and
SHOULD format any such audio to match the transmitted screens, e.g.,
with the left audio corresponding to the left screen. Such audio
choices, if made, MUST be indicated in the CLUE configuration setup.

Case 2-Reception. The browser displays multiple screens based on the
multiple screens available from other telepresence participants.

Sub Case 2-R a : Switched compositing. In this use case typically one
composited screen is shown, of some or all of the telepresence
participants, together with one or more full-sized screens of the
active participants (or, at times, of one static screen. Note that the
composited screens may have higher sub-screen resolution than for
thumbnails, or may have very different aspect ratios than is typical
for thumbnails. For example, a three screen telepresence unit, with an
individual screen aspect ratio of 16:9, may transmit a 48:9 aspect
ratio composite of all its screens in the proper display order.
Several such composites could be combined into a single 16:9 aspect
ratio to make the composited screen in this use case. This use case
includes multiple screens for selected speakers, for example a
composited screen to the left, the conference moderator in larger
resolution in the center, and (also in larger resolution) the current
or previous speaker (should the moderator be currently speaking) on
the right. In Case 2-R a, the browser SHOULD display screens and use
stereo or multi-channel audio conforming to the CLUE configuration,
with (in the above example), the conference moderator being assigned
the center audio channel and the current or previous speaker the right
audio channel.

Sub Case 2-R b : Telepresence mimicry. The browser displays screens as
if it were a telepresence unit involved in the telepresence session.
The browser MUST be able to fully participate in the CLUE protocol,
displaying, for example, left screens on the left and center screens
on the center. In general, these screens will be displayed at the same
aspect ratio, but with lower size / resolution, than in the full
telepresence session. Experience shows that, for participants who are
used to multi-screen telepresence, even lesser quality telepresence
mimicry is very popular and RTCWEB / CLUE SHOULD support this full
telepresence functionality.  At the high end of browser capabilities,
the full resolution and bit rates available could  be used to approach
a truly immersive telepresence at the browser receive side; this is
likely to become more common in the future.

In Case 2-R b, the browser SHOULD display screens and stereo or
multi-channel audio conforming to the CLUE configuration, with, e.g.,
the left audio channel corresponding to the left screen, etc.