Re: [rtcweb] Fwd: I-D Action: draft-alvestrand-one-rtp-00.txt

On 08/12/11 16:55, Paul Kyzivat wrote:
> On 8/12/11 2:45 AM, Harald Alvestrand wrote:
>> On 08/11/11 20:47, Paul Kyzivat wrote:
>>> On 8/11/11 7:32 AM, Harald Alvestrand wrote:
>>>> I have taken some of the information I learned from the discussions in
>>>> Quebec City about the issues of putting video and audio into the same
>>>> RTP session and created a draft from them outlining a solution to the
>>>> problem of signalling this configuration using SDP.
>>>>
>>>> Comments welcome, including the recommended context for processing the
>>>> document; in a few days I'll send the same note to AVTCORE and/or
>>>> MMUSIC.
>>>
>>> Harald,
>>>
>>> A few questions/comments:
>>>
>>> 1) If ICE is being used, how do you expect it to be handled on the 2nd
>>> m-line? Since one of the goals of multiplexing on a single rtp session
>>> is to minimize ICE processing, one would hope only to do ICE on the
>>> first of the grouped m-lines. But if the answerer doesn't support this
>>> grouping, then the ICE will be needed on the others.
>> I tried to be explicit that if negotiation is successful, only the ICE
>> parameters from the first section listed in the A=group:TOGETHER would
>> be used:
>
> Sure. But if the other side *doesn't* support TOGETHER, then ICE will 
> need to be negotiated independently for each m-line. And for that to 
> work, the port for the 2nd m-line but be active, other ICE candidates 
> identified and included for it, etc.
>
> ISTM the implications of that need further explanation.
> (I'm not an expert on ICE. If I were, maybe this would be obvious.)
There are 2 possible things to do for the offerer:
- Start ICE procedure on all ports when sending the offer
- Wait until the answer comes back, and either start ICE procedure on 
one port (if the responder understood TOGETHER), or start the ICE 
procedure on all ports (if the responder did not understand TOGETHER)

The first alternative is a number of milliseconds faster, but others 
should chime in on whether that's significant or not.

>
>> The following parameters are taken from the first section only:
>>
>> o Port number from the m= line
>>
>> o All media-level attributes defined in RFC 5245 section 15.1 - this
>> includes "candidate", "remote-candidates", "ice-mismatch", "ice-
>> ufrag", "ice-pwd"
>>
>> Suggestions for improving this language?
>>
>>> 2) The draft only shows this mechanism being used to combine one audio
>>> and one video. Do you also imagine it being used to represent multiple
>>> audio and/or multiple video streams that might otherwise be handled
>>> independently? (I see no reason why it couldn't be used that way.)
>>> This would allow description of different properties for each one.
>> I certainly think that any number of m= sections could be combined this
>> way, but in other discussions, I've seen people tend towards thinking of
>> sections as "one m= section for all the audio channels, one m= section
>> for all the video channels". Can you give an example of the usage you're
>> thinking of?
>
> If there was an intent to have multiple streams with different 
> properties, then this would provide a way to describe those, that 
> isn't otherwise available. For instance, if you wanted one hi-res 
> video stream and one low-res video stream.
>
> I'm also thinking ahead to what might be coming from CLUE 
> (telepresence). It isn't far enough along to yet be able to predict 
> how it will set up its multi-stream calls, but it will probably want 
> to multiplex them over a lesser number of RTP streams. And so I'm 
> shopping for potential mechanisms that might be exploited there.
>
>> I'm worried that we may get too many special rules about how parameters
>> from sections are to be combined - multiple codecs with different PT
>> numbers are fine, but other parameters could be confusing, so I'd like
>> to see examples.
>
> Maybe its premature. But no matter what, rules will need to be written 
> on how attributes are combined. So IMO its worthwhile to at least 
> start thinking about how to write those in general, rather than 
> writing a few special case hacks. It may turn out that the rules that 
> are to be followed for combining one audio and one video will also 
> "just work" for combining two audio or two video, etc.
>
>     Thanks,
>     Paul
>
>>> 3) In the example, for an entity that doesn't understand TOGETHER, you
>>> still show the a=mid lines. This is appropriate for an answerer that
>>> *does* understand a=group. However an answerer that doesn't implement
>>> rfc3388 would presumably omit the a=mid lines from the answer. So the
>>> offerer had better be prepared for that as well.
>> Yes, that's a reasonable point. I'll add a third example answer.
>>>
>>> Thanks,
>>> Paul
>>> _______________________________________________
>>> rtcweb mailing list
>>> rtcweb@ietf.org
>>> https://www.ietf.org/mailman/listinfo/rtcweb
>>>
>>
>>
>
>