Re: [AVTCORE] Comments on draft-westerlund-avtcore-rtp-simulcast-00

Magnus Westerlund <> Fri, 04 November 2011 09:58 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 36BCA21F8B71 for <>; Fri, 4 Nov 2011 02:58:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -106.561
X-Spam-Status: No, score=-106.561 tagged_above=-999 required=5 tests=[AWL=0.038, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id DJTCMkgE7SN6 for <>; Fri, 4 Nov 2011 02:58:38 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D133721F8B7B for <>; Fri, 4 Nov 2011 02:58:37 -0700 (PDT)
X-AuditID: c1b4fb39-b7b3eae00000252a-8a-4eb3b74c5a30
Received: from (Unknown_Domain []) by (Symantec Mail Security) with SMTP id 25.C4.09514.C47B3BE4; Fri, 4 Nov 2011 10:58:36 +0100 (CET)
Received: from [] ( by ( with Microsoft SMTP Server id; Fri, 4 Nov 2011 10:58:36 +0100
Message-ID: <>
Date: Fri, 4 Nov 2011 10:58:34 +0100
From: Magnus Westerlund <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: "Brandenburg, R. (Ray) van" <>
References: <> <> <>
In-Reply-To: <>
X-Enigmail-Version: 1.3.2
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: AAAAAA==
Cc: "" <>
Subject: Re: [AVTCORE] Comments on draft-westerlund-avtcore-rtp-simulcast-00
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 04 Nov 2011 09:58:39 -0000


I have removed all points where the is nothing to follow up. We will
take your comments into account when preparing an update.

On 2011-11-03 16:27, Brandenburg, R. (Ray) van wrote:
> -----Original Message----- From: Magnus Westerlund
> [] Sent: donderdag 3 november
> 2011 15:32 To: Brandenburg, R. (Ray) van Cc: Subject:
> Re: [AVTCORE] Comments on draft-westerlund-avtcore-rtp-simulcast-00
>> -          Section 3: This section talks quite a bit about the
>> relation between Simulcast and Scalable Coding and in which
>> situations one of the two is more suitable. I'm not sure why this
>> is relevant. Simulcast and Scalable Coding might be two techniques
>> that in some cases can be used to solve the same problem, the layer
>> on which they do this is completely different (codec-level versus
>> transport level). The fact that Scalable Coding might, in some
>> situations, be a better solution than Simulcast does not mean that
>> Simulcast should not be standardized to handle those situations. I
>> understand that you've written this mainly from a video
>> conferencing perspective, in which Scalable Coding might be used
>> more often, but in a lot of situations, such as IPTV, Scalable
>> Coding is just not an option due to its inherent complexity. In
>> these situations Simulcast might be able to solve some real issues,
>>  despite the fact that Scalable Coding might be able to solve these
>> issues more efficiently.
> I think layered encoding is both a codec level and a transport level
> thing. Just as in simulcast you need transport level methods for
> selecting the media streams/layers that you want to receive.
> I fully agree that there are use cases where one might be considered
> much more appropriate than the other. And that is part of our
> arguments for why it should be defined for usage also, even though we
> have layered coding specified.
> [RAY: I don't agree with you that layered coding is a transport-level
> thing (or maybe I don't understand your definition of layered coding.
> With layered coding I mean e.g. SVC). It might be that layered-coding
> requires some specific transport-level properties (such as
> alignment), however these are not unique to layered coding. Your
> draft (or specifically section 3) seems to imply that Simulcast can
> be considered an alternative to Scalable Coding and that you
> therefore have to argue why Simulcast is better in some situations.
> In my opinion, Scalable Coding is an alternative to creating multiple
> bitrates/resolutions (i.e. Adaptive Streaming) and not to Simulcast.
> My understanding is that Simulcast is a method to deliver those
> multiple bitrates, the same way it could be used to deliver Scalable
> Coding streams.]

Yes, SVC is one realization of layered encoding. And what I am getting
at is that SVC being a encoding technology do have specific transport
requirements to enable its functionality in some systems. For SVC this
is either a media aware network entity (MANE) that can remove layers
en-route. Or using multicast or other methods on the transport layers to
select the layers encoded.

I will not strongly disagree with your characterization in the second
half of the paragraph. I wouldn't use "adaptive streaming" as we are
interested in real-time conversational use cases, rather than content
delivery with somewhat relaxed source to receiver delay requirements.
But yes what we are discussing are methods for realization of adaptive
media delivery in point to multi-point scenarios.

>> -          Section 6.2: How do you suggest the sequence numbers and
>> RTP timestamps are handled between multiple alternative streams?
>> Would it make things easier if these were aligned across multiple
>> streams?
> No, I don't believe in alignment. first of all the sequence numbers
> needs to progress with the packets actually sent for a media stream. 
> Encoding alternatives are quite likely to have different packet
> rates, especially for video as soon as one comes to a media bit-rate
> where some alternative requires multiple packets per video frame.
> When it comes to timestamp I don't see any need either. For the best
> robustness and simplicity one should simply skip using alignment and
> instead use the mechanism for synchronizing media clocks with each
> other that exist, i.e. RTCP SR and Rapid Synchronization of RTP flows
> (RFC 6051).
> [RAY: For the sequence numbers I agree, however about the timestamps
> I'm not so sure. In some cases it might be particularly useful to
> align timestamps. One example are areas where you need bit-level
> alignment (such as SVC and spatially segmented streams). For these
> situations NTP-based solutions such as RTCP SR are not accurate
> enough]

I would not completely dismiss timestamp alignment, however it has on
multiple times become clear that mechanism that attempts to rely on more
of the basic and well defined behaviors of RTP header fields have
encountered issues or resulted in limitations with which RTP extensions
they can interop. So it is more to caution people here.

I don't quite see what you mean with bit-level alignment in the
discussion of timestamps. To my knowledge the SVC payload format is so
defined that this isn't an issue. In SST all layers are sent under a
single SSRC in a single timestamp space in one single media stream. Thus
common timeline is used for all NALUs in all layers. When using MST the
layers will be in differnt media streams in different RTP sessions. Here
using the same timestamp values or not is still dependent on them
mapping to the common sender clock represented by the NTP time. I think
RTP has no issues mapping multiple SSRC timestamp value with random
offset to the same time base clock to a granularity that is equivalent
to the the precision of the NTP values provided in the RTCP SR packets.
Taking care I think a precision better than 1/2^30 of a second i
possible. That is more than sufficient to align frames.

If you with bit-alignment means that you need to indicate that the
content of the packet is starting at the 10th line from the top and 240
pixels in, then I think using the timestamp for that is the wrong
choice. Instead the payload formats should use the timestamp to
correctly id the frame, and then use other explicit indicators for
off-set or relation.

> [RAY: Sorry for not being any clearer. We have a different
> understanding of tiled streaming (or spatial segmentation as is
> probably more descriptive). The use case is not having four
> network-connected projectors. The use case is having a mobile device
> with a relatively low resolution (e.g. 480x320 pixels) being able to
> navigate through high resolution (4k/8k) content. By splitting the
> output of a single 4k camera into 'tiles' of for example 240x160
> pixels, and storing each of these as a separate stream, a client can
> request multiple streams simultaneously and reconstruct part of the
> original video (in this case it would request 4 'tiles'). When a user
> wants to navigate through the content (e.g. pan/zoom) it requests a
> different subset of tiles. As you can see, what we have here is a
> large number of tiles (streams), coming from the same source, with
> different clients requesting different subsets of these tiles. In
> that aspect, it is not that dissimilar from some of the use cases you
> describe in your draft.]

Thanks for the explanation, it was needed. I haven't heard of that usage
of tiled in the context of video before.

First of all, I think this is something new and we should discuss the
most appropriate way or possible ways to transport this. I don't think
this belongs in the RTP simulcast draft as the use cases are currently
defined. The reason is that in all the cases of the simulcast we are
discussing are that the alternative media streams tries to represent the
full source, although at different quality levels when it comes to SNR,
resolution etc. While tiled video is dealing with the selection and
handling of multiple partial representations of a source.

I think one way of dealing with tiled video is to make each tile
separate media stream, where each tile will use its own SSRC. Thus one
could potentially on SSRC level control if a stream is being delivered
or not. Another way of viewing the problem is to only use as many SSRCs
as you need simultaneously delivered tiles. Then you use something other
than SSRCs to indicate which tile is currently delivered in each media

I think this is an interesting problem, if you are interested in
standardizing a recommended way of handling tiled video you should
create a separate internet draft to discuss this problem.


Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVM
Ericsson AB                | Phone  +46 10 7148287
Färögatan 6                | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: