Re: [rtcweb] Support of video with different resolutions

Randell Jesup <randell-ietf@jesup.org> Mon, 31 December 2012 16:55 UTC

Return-Path: <randell-ietf@jesup.org>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44A3221F88C3 for <rtcweb@ietfa.amsl.com>; Mon, 31 Dec 2012 08:55:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.424
X-Spam-Level:
X-Spam-Status: No, score=-2.424 tagged_above=-999 required=5 tests=[AWL=0.175, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kvOwu034uMBW for <rtcweb@ietfa.amsl.com>; Mon, 31 Dec 2012 08:55:45 -0800 (PST)
Received: from r2-chicago.webserversystems.com (r2-chicago.webserversystems.com [173.236.101.58]) by ietfa.amsl.com (Postfix) with ESMTP id A050B21F88A6 for <rtcweb@ietf.org>; Mon, 31 Dec 2012 08:55:45 -0800 (PST)
Received: from pool-98-111-140-34.phlapa.fios.verizon.net ([98.111.140.34]:2785 helo=[192.168.1.12]) by r2-chicago.webserversystems.com with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.80) (envelope-from <randell-ietf@jesup.org>) id 1TpieP-00026H-A8 for rtcweb@ietf.org; Mon, 31 Dec 2012 10:55:45 -0600
Message-ID: <50E1C36A.8000300@jesup.org>
Date: Mon, 31 Dec 2012 11:55:06 -0500
From: Randell Jesup <randell-ietf@jesup.org>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20121026 Thunderbird/16.0.2
MIME-Version: 1.0
To: rtcweb@ietf.org
References: <7594FB04B1934943A5C02806D1A2204B06E211@ESESSMB209.ericsson.se> <BLU002-W15E73B3AAD5DBC4D12C5DC93360@phx.gbl> <50D4F06D.3020602@omnitor.se>
In-Reply-To: <50D4F06D.3020602@omnitor.se>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 8bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - r2-chicago.webserversystems.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - jesup.org
X-Source:
X-Source-Args:
X-Source-Dir:
Subject: Re: [rtcweb] Support of video with different resolutions
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 31 Dec 2012 16:55:46 -0000

On 12/21/2012 6:27 PM, Gunnar Hellström wrote:
> On 2012-12-21 16:40, Bernard Aboba wrote:
>> SVC supports multiple scalability mechanisms: temporal, spatial and 
>> quality. Temporal is the most widely implemented in software (at 
>> least for H.264/SVC), although there is an open source implementation 
>> that supports all modes. In hardware, support for spatial or quality 
>> scaling is not common. As a result, a browser on a mobile device 
>> could have difficulty complying with a mandate to support spatial 
>> scaling within the encoder.
>>
> This is sad. For good usability of video, maintained frame rate is 
> usually much more important than maintained spatial resolution. E.g. 
> for sign language or lip reading usage with a single person in image, 
> a frame rate under 20 fps introduces loss of language contents, and 
> requires the users to try to fill in the gaps by imagination, while 
> spatial resolution reduction down to QCIF causes much less harm to 
> language perception possibilities.

Strong agreement. High (and consistent!) frame rate is critical to VRS 
and sign-language - and for hearing people, it 'feels' much more like 
talking to someone if the video framerate is kept >20 (and preferably 
 >25), with perfect lip-sync. This is based on the experience I and 
Maire Reavy had at Worldgate, where we provided videophones for VRS 
usage. 30FPS QCIF works pretty well for sign-language, and better 
generally than 20FPS CIF, even though CIF is 4x the number of pixels.

Note that there's no reason a browser in a mobile device couldn't 
maintain 30fps, and if congestion indicates it needs to reduce bandwidth 
too much for 'good quality' 30FPS, it can instead reduce the resolution. 
The only time I'd value resolution over framerate is when viewing 
graphics or the equivalent. SVC certainly isn't needed here for 1-1 
communication; a main advantage of SVC is to allow a node in the network 
(i.e. conf. server/MCU) to subset down a stream for one recipient while 
providing higher-resolution streams to others while avoiding 
decode/re-encode on the server. Of course, this was the use-case that 
started this.

In a corollary, frame rate should not drop when there's a lot of 
movement, or should only drop momentarily. (And I mean very momentarily, 
like dropping a single frame when there's a big jump in motion from the 
last frame, then resuming 30FPS at higher quantization or higher bitrate.)

However, Christer's original question said:
"However, the “local transcoding” alternative will consume lots of 
resources, and as far as I know none of the proposed MTI video codecs 
support SVC. "

This is I believe is at least partially incorrect. From 
blog.webmproject.com, 1/27/2012, "VP8 Codec SDK "Duclair" Released":

    This release introduces substantial new VP8 encoder features that
    are especially useful for real-time use cases such as live streaming
    and videoconferencing.

      * Temporal scalability produces a video stream that can be
        decimated to different frame rates, with independent rate
        targeting for each substream.
      * Multiframe postprocessing can make visual quality more
        consistent in the presence of frames that are of substantially
        different quality than the surrounding frames, as in the
        temporal scalability case and in some forced keyframe scenarios.
      * Multiple-resolution encoding enables simultaneous encoding of
        the same content at different resolutions, resulting in much
        faster encoding than processing them separately.

This is both multicast and temporal scalability.

-- 
Randell Jesup
randell-ietf@jesup.org