Re: [MMUSIC] Proposal for what bundle should say about demux

Colin Perkins <> Mon, 03 June 2013 21:25 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 18AD621F8FBE for <>; Mon, 3 Jun 2013 14:25:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -106.555
X-Spam-Status: No, score=-106.555 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, DATE_IN_PAST_03_06=0.044, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 0zx7kvwnvjD0 for <>; Mon, 3 Jun 2013 14:25:28 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id E811211E80F3 for <>; Mon, 3 Jun 2013 14:20:52 -0700 (PDT)
Received: from [] (port=46473 helo=[]) by with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.72) (envelope-from <>) id 1UjcBO-00020C-Fy; Mon, 03 Jun 2013 22:20:51 +0100
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="us-ascii"
From: Colin Perkins <>
In-Reply-To: <>
Date: Mon, 03 Jun 2013 18:48:03 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <>
To: Cullen Jennings <>
X-Mailer: Apple Mail (2.1283)
X-BlackCat-Spam-Score: -12
X-Mythic-Debug: Threshold = On =
Cc: mmusic WG <>
Subject: Re: [MMUSIC] Proposal for what bundle should say about demux
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 03 Jun 2013 21:25:53 -0000


On 30 May 2013, at 20:06, Cullen Jennings (fluffy) wrote:
> I've been deliberately merging layers A, B, C all into one in my discussion of this and focusing on the media stack code which has to deal with all those layers. The reasons I have been doing that is people just seemed to confused about what all the parts of RTP are such as RTP sessions. But most people seemed to understand what bits in the various packets they could look at what and what data had to end up at the right spot in the end. 

I agree that there is some confusion here. However, rather than conflate the layers, I think it's important that we clearly identify the different layers and their purposes. WebRTC is trying to use "advanced" features of RTP, such as multiple simultaneous streams, FEC, layering, simulcast, and RTCP-based feedback, within a single session. To make that work, I think we need a clear understanding of what's done (conceptually) where, else we're running the risk of the misunderstandings causing non-interoperable behaviours.

Conflating all the layers may work for a simple endpoint, with limited functionality, but I think it will cause increasing problems for WebRTC. I'd like to get agreement on what conceptual layers exist, and how they are demultiplexed, to remove this confusion.

> At a high level, I would draw the layers a little differently than you in that many of the implementation I work with do the jitter buffer after the data is decoded. This allows the implementations to do better packet loss concealment and play rate games where they speed up or slow down playback very slightly to smoothly adapt the jitter buffer depth. 
> But regardless of the details of the order, the jitter buffer needs to look at the SSRC and if the SSRC has changed, it probably needs to retrain the jitter buffer. 


> Lets talk a bit more about your diagram and what happens when Alice creates and Offer with that does not have use bundle and has two audio codecs. When an RTP packets #1 gets sent to Alice and it arrives before the answer, somewhere in layer B/C, something needs to look at the PT value to decide what codec this goes to. Alice knows nothing about what SSRC the other side is using. Now lets say that packet #2 arrives (again before answer) and packet #2 is on the same port, same PT, but has a different SSRC. None of the implementations I work with for audio would allocate a new PB or CR. Instead they would pass packet #2 throughout the same codec instance that packet #1 had used and expect things like the jitter buffer to note the SSRC change and act accordingly. For DSP based audio systems where the actual codec instances are loaded into the DSP and have a reserved set or resources allocated for them, this is a very common design. 
> There is probably more variety in video applications but many conferences bridges act this way. 
> So I have no problem if things split on SSRC and whatever processing that can be done knowing only that is done. That probably includes RTCP, FEC, RTX , who knows. But before the packed can be decoded by the codec, something needs to consider the PT and other things such as port when not doing bundle to make this work. Keep in mind things like PLI are much more likely to be sent at the layer near the codec than at layer B so layers B and C are not as cleanly separated as one might wish. 

Sure. A real implementation will have all sorts of links between the layers. However, I still think there are separate conceptual stages in the demultiplexing: a) separate by protocol (RTP, RTCP, STUN, SCTP, etc.) and pass to that conceptual protocol instance; then for RTP and RTCP packets b) find what RFC 3550 calls synchronisation sources, and do the RTP and RTCP layer processing, FEC, RTX, etc; and c) relate those synchronisation sources to higher-layer concepts in the application, decode the media, and render.

Finding the synchronisation sources at layer B is straight-forward. You look at the SSRC field of the RTP and RTCP packets and group, following RFC 3550.

Mapping from synchronisation source to codec/rendering context at layer C can be use any combination of the PT, a=fmtp: parameters of the PT, SSRC, RTCP SDES items, RTP header extensions, etc., that you care to signal. 

> That's a long way of saying, I was talking about the aggregate demux in media stack that include whatever happened at layers B and C. If layer B only used SSRC that's fine with me. I am trying to say what happens in the combination of layer B/C put together.  
> What terminology should I be using to not cause confusion ?

While you can conflate layers B and C in your implementation, I think we need to discuss them separately in the specifications to avoid confusion. Demultiplexing at layer B finds the synchronisation sources and is an RTP layer feature. I think you're talking about mapping from synchronisation sources to codecs/rendering pipelines, which is an operation at layer C of my diagram. 

That is, I think you can do what you want entirely at conceptual layer C by the considering the mapping from synchronisation sources to codec and rendering context. I don't think you need to conceptually combine layers B and C, and I don't think we should combine them in the specification (of course, just because we specify the demultiplexing as happening in three conceptual stages doesn't mean you can't implement it in whatever way you like, provided it has functionally equivalent behaviour).


> On May 27, 2013, at 7:57 AM, Colin Perkins <> wrote:
>> I don't agree with the phrasing about "packet processing pipelines", but can't tell if this is a terminology disagreement or a more fundamental disconnect. The way I see the demultiplexing logically working is:
>>                   |
>>                   | packets
>>   +--             v
>>   |           +----------+
>>   |           |UDP socket|
>> A  |           +----------+
>>   |        RTP ||  |  |
>>   |   and RTCP ||  |  +------> SCTP
>>   |            ||  +---------> STUN/ICE
>>   +--          ||
>>   +--          ||
>>   |       split by SSRC
>>   |       ||   ||   ||
>>   |       ||   ||   ||
>> B  |      +--+ +--+ +--+
>>   |      |PB| |PB| |PB| Playout buffer, process RTCP, FEC, etc.
>>   |      +--+ +--+ +--+
>>   +--      |   |     |
>>   +--      |  /      |
>>   |        +---+     |
>>   |         /  |     |
>> C  |      +--+ +--+ +--+
>>   |      |CR| |CR| |CR| Codecs and rendering
>>   |      +--+ +--+ +--+
>>   +--
>> If your algorithm is for demultiplexing at layer C, i.e., to figure out what codec and rendering pipeline to use, then I think we're in agreement apart from terminology. 
>> For layer B, I believe the SSRC is the right thing to use to demultiplex, and fits with RTP and RTCP. This is where RTCP is processed, playout de-jitter buffering happens, FEC is processed, NACKs are sent, etc. It's logically independent of the decoding and rendering process since you can start filling your de-jitter buffer for an SSRC before you figure out if/how/where you're going to render that SSRC. 
>> For layer A, there was a clear-cut way of doing this with RTP, RTCP, and STUN. I haven't looked at SCTP enough to know how that affects things. I do think it's a logically separate issue, and should be documented separately to BUNDLE though, since it the same issues arise with non-bundled sessions.
>> An implementation might merge these together, of course, but to avoid confusion the standards should be clear what level they're considering.
>> Colin
>> On 27 May 2013, at 14:15, Cullen Jennings (fluffy) wrote:
>>> Great - sounds like we agree this algorithm will work.
>>> On May 27, 2013, at 6:41 AM, Colin Perkins <> wrote:
>>>> I'm not sure I agree.
>>>> As I said in my previous message to the list, if we are agreed that the m= lines in a BUNDLE group form a single RTP session, then I believe we need unique payload types across all m= lines. In this case, BUNDLE can simply say that regular RTP source demultiplexing based on the SSRC has to be performed, then the payload type can be used to match sources to m= lines for those applications that care about doing so. 
>>>> If we're not agreed that the m= lines in a BUNDLE group form a single RTP session, then we have a lot more to discuss...
>>>> Colin
>>>> On 23 May 2013, at 19:02, Cullen Jennings (fluffy) wrote:
>>>>> Here's is my proposal for roughly what the bundle draft should say about this demux topic 
>>>>> Application will decide which packet processing pipeline to pass an given RTP/RTCP packet to based on what the application knows:
>>>>> 1) If future RFCs define new things (like RTP header extension), that explicitly specify the mapping, check if that future RFC is in use and if so then use that to form the mapping 
>>>>> 2) If the PT type is uniquely identifies a mapping, use that to form the mapping
>>>>> 3) If application knows the SSRC the other side will use, use that to form the mapping 
>>>>> 4) if there is no way to know which pipeline to pass it too, the packet MAY be discarded or the application MAY decide to buffer it until the mapping is known 
>>>>> This is trivial to implement. It meets the requirements for Plan A, Plan B, UCIF, CLUE, and so on. 
>>>>> We could swap the order of step 2 and 3, My thinking for this order was the only time it made any difference the order was if the PT were unique and indicated a different mapping than the SSRC. The only way this could happen is with a SSRC collision so the PT is the one that would be correct not the SSRC. If someone feels strongly the order of steps 2 and 3 should be the opposite way around, I can live with that.
>>>>> _______________________________________________
>>>>> mmusic mailing list


Colin Perkins