Re: [maitai] Roles of Sender and Receiver

On 12/4/2010 11:30 AM, Peter Musgrave wrote:
> Hi Paul,
>
> I too gravitate towards concrete examples...so I am happy to go down the rabbit hole.
>
> <rabbit_hole>
> One of Charles points was that an endpoint might not be able to select a specific video feed, but rather say something like "I only receive one stream, please use voice activity to decide which stream to send me".  So the answering EP might not know which of three streams this should be....
>
> It's also starting to seem like either end potentially could be the one who makes the decision.
>
> In the SIP context you present, it is therefore not clear to me that the answering endpoint is always the one making the decision about which media stream to get (as would be "normal" in SIP) and the offeror can't make any decisions until it knows about the receivers configuration.

This depends a bit on how much you assume about the UAS. In the worst 
case, its just a generic sip UA that supports one audio, and maybe one 
video stream. If offered more than one of each, it will probably either 
reject the call or else just accept the first of each and refuse the rest.

For a caller with more than that to offer, it would make sense for *it* 
to put the audio and video stream that are most important *first*.

In that case, the UAS is in some sense making the choice, but in a 
particularly dumb way. Its really the UAC that is making the decision 
about what a dumb UAS will get.

After that would could imagine a hierarchy of increasingly intelligent 
UASs that we might be calling. I'm not certain if that is worthwhile or 
not. For instance, maybe a UAS that is capable of understanding role 
tags for the offered streams and at least making an explicit choice of 
which one(s) to accept, rather than just accepting the first one.

After the first O/A exchange, the UAC will hopefully understand 
something about the capabilities of the UAS. Depending on the perceived 
stupidity of the UAS, the UAC might decide it knows best, and manipulate 
what it decides to send over the limited streams the UAS accepted.

If the UAS is discovered to be more capable, then a more complex 
negotiation might be followed.

> In fact in the first exchange we also need to provide some information about media asymmetry (e.g. I may have three screens, but 4 cameras or three screens and two cameras). so my video m lines do not necessarily imply anything my ability to send that many videos....
>
> I am not sure how best to do capability exchange. I suspect the description we end up with will be of non-trivial size. Perhaps it is a new Content body type - and these capabilities are exchanged instead of SDP in the first exchange? It would work - but it creeps me out - since I expect it breaks when going through most middle boxes. The other tools in the toolbox which could apply are SUBSCRIBE, NOTIFY, PUBLISH etc. Perhaps there is a role for them. If OPTIONS went end-to-end it could be a candidate...

I don't know about that yet. As you say, there are a multitude of 
possibilities.

> Ok - out of the rabbit hole!
> </rabbit_hole>
>
> Let me ask more questions:
>
> Does the capability exchange need to be part of dialog establishment?
> Could it be done as a precursor exchange?

IMO it would be best to at least *start* the capability exchange as part 
of dialog establishment. You may not know *anything* about the UAS prior 
to the initial INVITE, so it should be possible to start there and 
eventually end up where you need to be.

I know Jon Peterson has been floating some ideas about something like 
you are talking about. But that is a much bigger subject. I don't think 
it makes much sense to do such a thing as a telepresence-specific mechanism.

One possibility is to use presence to discover such capabilities. But I 
think this should be optional.

> Could an endpoint reasonably cache capabilities of EPs it has prior contact with? [This may be important if our data representation requires a certain amount of "interpretation"] ?

Well, perhaps. But it then should not be done by AOR, since there may be 
multiple UAs behind an AOR. Again, presence might be the approach to 
doing that.

	Thanks,
	Paul

> Peter
>
> On 2010-12-03, at 8:20 PM, Paul Kyzivat wrote:
>
>>
>>
>> On 12/2/2010 7:53 AM, Peter Musgrave wrote:
>>> Hi,
>>>
>>> I have been pondering possible ways of representing a room configuration - but I think I need to back up and ask a question about basic interop architecture.
>>>
>>> In the selection of content on a stream, who is the active decision making entity, sender - or receiver - or both?
>>>
>>> Consider a three display, 3 camera (3d3c) room A sending to a 1d1c room B.
>>>
>>> Does A offer all three streams to B and let B pick? (This leads in the direction of dynamic changes based on B, which is powerful but leads us into continuous control...)
>>> Does A determine B has only one display and select which video B should get and send only one stream?
>>>
>>> IHMO an approach which lets B have control over what it gets is more powerful. e.g. If I have heard the pitch from the speaker 100 times, I want to watch the customer and see if they are engaged, rather than watch the active speaker blather on....
>>
>> Interesting question.
>>
>> Maybe good to start considering this in purely sip terms without regard to telepresence specifics.
>>
>> In that context I guess A might be sending an offer with three video m-lines, and would not initially know whether the recipient could deal with them all or not. (I'll ignore the audio to keep things simpler.) In a pure sip context it also may not be apparent what the roles are for the three media streams.
>>
>> Then B would have to answer. If it can't handle three video m-lines, then it will have to choose one, though it may not have any basis to pick one over another. Perhaps it would just take the first.
>>
>> So then we end up with a session having one video stream, with B having chosen *which* of the streams it was, and with A choosing *what* is transmitted on that stream.
>>
>> And of course the challenge of maitai is to improve on that situation.
>>
>> A minimalist improvement is for A to label the offered streams with the roles they play, so that B has a basis for choosing.
>>
>> For maitai it will be necessary to do more than the above, so there is need for more description. But ideally it will be "backward compatible" to the extent that a dumb device will be able to make *some* sense of what it is offered and end up doing something as reasonable as it makes sense for a dumb device to do in that context.
>>
>> 	Thanks,
>> 	Paul
>> _______________________________________________
>> maitai mailing list
>> maitai@ietf.org
>> https://www.ietf.org/mailman/listinfo/maitai
>
>