Re: [AVTCORE] Comments on draft-westerlund-avtcore-rtp-simulcast-00

Magnus Westerlund <> Thu, 03 November 2011 14:31 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id AA6E411E80EC for <>; Thu, 3 Nov 2011 07:31:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -106.579
X-Spam-Status: No, score=-106.579 tagged_above=-999 required=5 tests=[AWL=0.020, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id mS3LH0a98rl3 for <>; Thu, 3 Nov 2011 07:31:37 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 5C38311E80B4 for <>; Thu, 3 Nov 2011 07:31:37 -0700 (PDT)
X-AuditID: c1b4fb39-b7cb2ae000001bd8-e3-4eb2a5c89fb6
Received: from (Unknown_Domain []) by (Symantec Mail Security) with SMTP id 49.B0.07128.8C5A2BE4; Thu, 3 Nov 2011 15:31:36 +0100 (CET)
Received: from [] ( by ( with Microsoft SMTP Server id; Thu, 3 Nov 2011 15:31:36 +0100
Message-ID: <>
Date: Thu, 3 Nov 2011 15:31:34 +0100
From: Magnus Westerlund <>
User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:7.0.1) Gecko/20110929 Thunderbird/7.0.1
MIME-Version: 1.0
To: "Brandenburg, R. (Ray) van" <>
References: <>
In-Reply-To: <>
X-Enigmail-Version: 1.3.2
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 8bit
X-Brightmail-Tracker: AAAAAA==
Cc: "" <>
Subject: Re: [AVTCORE] Comments on draft-westerlund-avtcore-rtp-simulcast-00
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 03 Nov 2011 14:31:38 -0000


Thanks for the review. Some replies inline.

On 2011-10-26 09:27, Brandenburg, R. (Ray) van wrote:
> Hi Magnus,
> I reviewed your draft draft-westerlund-avtcore-rtp-simulcast-00 and have
> some questions/comments:
> -          Section 3: This section lacks a summary/conclusion. A number
> of different scenarios are introduced, some of which are explicitly
> stated to be out-of-scope (e.g. 3.5), and some are stated to be in-scope
> (e.g. 3.3). Some scenarios however (e.g. 3.2) are not specified to be
> either in- or out-of-scope.

We consider Multicast in scope but with less importance that the mixer
based scenarios.

> -          Section 3: This section talks quite a bit about the relation
> between Simulcast and Scalable Coding and in which situations one of the
> two is more suitable. I’m not sure why this is relevant. Simulcast and
> Scalable Coding might be two techniques that in some cases can be used
> to solve the same problem, the layer on which they do this is completely
> different (codec-level versus transport level). The fact that Scalable
> Coding might, in some situations, be a better solution than Simulcast
> does not mean that Simulcast should not be standardized to handle those
> situations. I understand that you’ve written this mainly from a video
> conferencing perspective, in which Scalable Coding might be used more
> often, but in a lot of situations, such as IPTV, Scalable Coding is just
> not an option due to its inherent complexity. In these situations
> Simulcast might be able to solve some real issues, despite the fact that
> Scalable Coding might be able to solve these issues more efficiently.

I think layered encoding is both a codec level and a transport level
thing. Just as in simulcast you need transport level methods for
selecting the media streams/layers that you want to receive.

I fully agree that there are use cases where one might be considered
much more appropriate than the other. And that is part of our arguments
for why it should be defined for usage also, even though we have layered
coding specified.

> -          Section 3.1.1 See comment above. Why is this section relevant
> to the rest of the document?

See above. But will attempt to clarify this.

> -          Section 3.2.1: This scenario is the only one of the scenarios
> in section 3 to make a clear pro/con of SSRC versus Session based
> multiplexing. Furthermore, it does so before the actual discussion of
> SSRC vs. Session-based multiplexing is introduced in section 4.

Ok, for consistency we should likely move the pro and cons to the
analysis section.

> -          Section 3.3: For the tiled streaming use case described
> below, this scenario is especially relevant. Is there any technical
> reason why it has less emphasis than the RTP-Mixer scenario?

If my understanding of tiled is correct, I don't see how this use case
maps onto tiling. Please elaborate.

> -          Section 5.5: Your conclusion here is that Session Based
> Multiplexing is the best choice, since SSRC multiplexing seems to
> require a large amount of extensions in order to work. How does this
> conclusion relate to the draft-lennox-rtcweb-rtp-media-type-mux-00 which
> seems to suggest that SSRC multiplexing is not that difficult? (I should
> note that I have no experience with sending multiple streams inside a
> single RTP session, so it could be that I don’t understand the issues
> correctly)

If you review RTP Multiplexing architecture you are likely going realize
that there are a number of cases where I consider SSRC multiplexed media
streams to be the best and most appropriate choice. However, it is
important that one consider the use case. So this document argues that
for Simulcast the best choice is to have the simulcast version in
different RTP sessions. However, if one have multiple media streams one
should in fact use multiple SSRC in each of these sessions to carry the
different media sources.

And in sessions where all media streams going to be sent are of the same
type and same configuration then using SSRC multiplexing becomes easy.
The added twist in Lennox's draft is the multiple media types and that
do bring a few issues, but not that many.
has discussion on that proposal.

The main point of the multiplexing architecture document is to make
clear that one size fits all does not apply to using multiple SSRCs vs
using several RTP sessions.

> -          Section 6.2: How do you suggest the sequence numbers and RTP
> timestamps are handled between multiple alternative streams? Would it
> make things easier if these were aligned across multiple streams?

No, I don't believe in alignment. first of all the sequence numbers
needs to progress with the packets actually sent for a media stream.
Encoding alternatives are quite likely to have different packet rates,
especially for video as soon as one comes to a media bit-rate where some
alternative requires multiple packets per video frame.

When it comes to timestamp I don't see any need either. For the best
robustness and simplicity one should simply skip using alignment and
instead use the mechanism for synchronizing media clocks with each other
that exist, i.e. RTCP SR and Rapid Synchronization of RTP flows (RFC 6051).

> And one more remark:
> -          In the introduction session you describe three different ways
> in which the encoding of a media content can differ: bit-rate, codec and
> sampling. Recent work in the area of immersive media has proposed a
> fourth method: tiled video. With tiled video (or spatial segmentation),
> the video resulting from a single camera is split into a number of
> areas, each focusing on a particular spatial area of the video (e.g. a
> single video source could be tiled into four separate video streams, one
> describing the topleft quarter of the video and three more describing
> the topright, bottomleft and bottomright quarters respectively).

If I understand this correct, what you really are talking about are
breaking up the output of one camer or several aligned cameras that can
be considered to produce a single video image is divided into several
separate streams. Then on the receiver side these are commonly one
device per media stream. So a practical example would be to split the
output of 4k camera into four 1080p media streams and on the receiver
side use 4 1080 projectors that are aligned so the right edges matches.

If I am correct in my understanding that isn't simulcast. There are no
two or more  different alternatives of the media source. From my
perspective it looks like your splitting your single media source into
multiple synchronized sources. However, it is clear that if you are
doing Tiled video then you would like to consider the transport options
for it. Because it depends if the receiver is actually a single
end-point with 4 displays connected to them or if one actually have four
different end-points and like to have each stream only being delivered
to each end-point, not all of the others.


Magnus Westerlund

Multimedia Technologies, Ericsson Research EAB/TVM
Ericsson AB                | Phone  +46 10 7148287
Färögatan 6                | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden| mailto: