February 5-7th, 2013 RTCWEB/MMUSIC Joint Interim Minutes MMUSIC Chairs: Flemming Andreasen, Ari Keränen RTCWEB Chairs: Ted Hardie, Cullen Jennings, Magnus Westerlund (not present) Proceeedings: http://www.ietf.org/proceedings/interim/2013/02/05/rtcweb/proceedings.html Recordings: https://cisco.webex.com/ciscosales/lsr.php?AT=pb&SP=MC&rID=65855352&rKey=10997854d4b84af5,https://cisco.webex.com/ciscosales/lsr.php?AT=pb&SP=MC&rID=65887687&rKey=8c217cc13c9e0ea7 Note takers: Mary Barnes, Spencer Dawkins, Ari Keränen, Adam Roach, Tim Terriberry February 5th Topic: General Introduction to SDP issues raised by RTCWEB work Presentation: http://www.ietf.org/proceedings/interim/2013/02/05/rtcweb/slides/slides-interim-2013-rtcweb-1-2.pptx Participants discussed both the IETF and WebRTC use of terms like “media stream” and track, attempting to work through to a common vocabulary from a set that is currently somewhat confusing. There were suggestions that the WebRTC group consider changing MediaStream to MediaStreamTrackSet, in order to help developers understand that what is now a MediaStream may contain media in multiple tracks, but these did not have consistent support. It was agreed that extreme care in the use of the terms was required. The relationship among them needs to be explicit in part because the WebRTC group needs to understand how to plumb the tracks (within MediaStreams, whether on PeerConnections or local) to SDP-level constructs like SSRCs. Participants then discussed the updated functionality required by RTCWEB and CLUE, going off the discussion on pages 17-32 of the same presentation. This discussion was wide-ranging, and unfortunately, not perfectly captured. The three main topics were: Signaling for sessions that have multiple sources of the same type sent from a single endpoint and received be a peer endpoint; transport aggregation; multiple encodings for a single media source (here meaning camera, microphone, or similar). For signaling, among the points raised: CLUE and RTCWEB likely have two different usage patterns. For CLUE, multiple media sources will be common, multiple encodings per media source will be the norm, multiple endpoints may be visible even in unicast transport, and there may be one or more RTP session. For RTCWEB, the current theory is that there is a single RTP session per media type, if not a single RTP session full stop. All SSRCs related to a single peer connection (which could encompasses more than one MediaStreams having more than one track) would come from a single SSRC space. The group also discussed how to handle implicit SSRCs; this gave rise to a larger meta-discussion on whether the aim of the MMUSIC work was to create something that met just RTCWEB’s needs or had a larger scope. Among the comments: Jonathan Lennox pointed out that it was MMUSIC’s job to provide tools, with RTCWEB being among those requesting tools. Flemming said that it might be useful to constrain the problem set in order to make progress. Hadriel said that in past meetings we’ve looked for a common solution. Decisions: * Decision - we won't use the words "media stream" without an identifier for context * Action item to develop new terms? Jonathan will provide the list of laggards who volunteered to help him do that, and the chairs will nag them * action - Ted will capture Jonathan's comments at the mike for inclusion in RTCWeb documents. Dale Worley asked that "RTP session" be included in the rune reading * Action - for Richard Ejzak to think about doing a use case-by-use case analysis of the various approaches Scope of work options: Option 1 - driven by broader RTCWeb use cases 14 hands Option 2 - add CLUE use cases 9 hands Option 3 - add any offer/answer use cases 4 hands February 6th BUNDLE, ONE-RTP, MMT:http://www.ietf.org/proceedings/interim/2013/02/05/rtcweb/slides/slides-interim-2013-rtcweb-1-0.ppt * Note - If we proceed with MMT, we should use m=application instead of m=anymedia * Question - Are folks generally okay with any of the approaches that reduce the number of long-lived flows to this level OR is there some other feature in one of the proposals you must have before this meets your needs? 6 for the former, 17 for the latter Cullen's plan to dig us out of this hole:http://www.ietf.org/proceedings/interim/2013/02/05/rtcweb/slides/slides-interim-2013-rtcweb-1-10.pdf * Proposed Superset of Requirement 2 - Should be able to interop with legacy without the JS doing anything special. * Proposed Relaxation of Requirement 2 - Should be able to interop with legacy, but not necessarily in one O/A exchange within a single dialog (without relying on failure cases). * Requirement 3 - Change video flows to media flows * Proposed Requirement - When we have a large conference with 1000's of people, we don't have to send new signaling to every participant of the conference. * Proposed Requirement - Must be able to configure a set of similar flows such that once established, the endpoints can add or remove flows without having to do an O/A cycle (i.e., without having to exchange SDP). * Proposed Requirement - Have this process terminate in some finite period of time. * Point of Discussion - Plan A: same port on each m-line or different candidate sets? * Action - Plan B: Write the drafts to extend SDP to explicitly describe multiple cameras with multiple streams from each camera under a single m-line. * Decision - Should we come up with a set of requirements and proceed with Plan A and Plan B with the timelines outlined here? 24 for, 5 against DataChannel SDP * Action - Justin to draft API for SDP-only channel open * Decision (rtcweb hats) - 2a) In-band only - 6 2b) SDP & in-band - 10 2c) SDP-only - 15 Work on both will continue until a decision is reached. * Action - Martin believes he can send a proposal to the list which will change some minds. February 7th MSID Harald Alvestrand 1. Mechanism for declaring associations between SSRCs 2. Might be between sessions (or within sessions) Fleming: Does this include Data Channels? HTA: Datachannels don't need this. Martin: I don't think we need the association to MediaStreams. HTA: This is meant for one end to identify one end of a thing using the same identifier as the other end. 1. MSID semantics are declared by an msid-semantic line 2. Currently, only "WMS" defined (WebRTC Media Stream) 3. Means "Deliver these sources to these media stream tracks on the other side." draft-alvestrand-mmusic-msid-02 is the most recent version. Approved for adoption, just waiting on charter now. Bernard: I'm confused by the difference between these being communicated at a session level and indentifying streams in different RTP sesions. (Some clarificaiton about the intended scope of binding follows) EKR: If I understand correctly, every m= line in the same MediaStream has the same MSID identifier first half, but a different second half. HTA: That's correct. Fleming: The chairs are under the impression that they are waiting for the author, not the other way around. HTA: I'm waiting for the charter to be amended. ACTION ITEM: Chairs to amend charter to include a milestone for this work. (Interchange between Martin and HTA regarding the identification of simulcast streams, indicating that they're treated as different tracks) Paul K: The other day, we had a discussion of all the meanings of "Stream" (e.g. one SSRC, one "flow", etc). This just maps to the SSRC definition. Is there any way to get at the other notion with this mechanism. HTA: Which other notion? Paul K: The one where we have several SSRCs, but only one at a time, and those SSRCs are stitched together to make a single logical construct. Jonathan L: There is an assumptoin in current SIP endpoints that the device should only need to decode one thing at a time. That doesn't necessarily mean that they're semantically the same thing. For example, you can get ringback followed by an announcment that your call is being forwarded, then the person you're calling. Those probably aren't the same thing, from an MSID perspective. HTA: All we say is that this media stream belongs to this MSID. It's a W3C issue to define what "belong to" means. If you have to distinguish between "play these all at the same time" or "play only one," that's communicated using a different mechanism. Jonathan L: If we want to use this as opposed to a=group for things like simulcast groups, we need to be very clear about which one applies. HTA: Do we want to express the relationship between SSRCs in the same RTP session, or different RTP sessions? Jonathan: RTP sessions or m lines? HTA: I think we want to say m lines. I'll check the document and make sure this makes sense. Paul: Is it valid to map more than one m-line to the same msid? HTA: Yes Paul: Why? HTA: Layers in a layered codec, for example. Channels in a simulcast, for another. Different types of sequential media, as a third. Martin: There's a fourth one also HTA: Oh, there's many more. HTA: Request to make sure we use the same terms as other documents. ________________ Media Source ID: Something we need? Stefan Håkanson (See slides for diagram of motivating use case scenarios) EKR (Regarding "to a peer" use case): The alternate approach is to have a single stream with all three tracks in it, and have the far end switching the video but not the audio. I'm sensitive to making this more complicated than necessary. Christer: We'll be sending the audio in two different tracks, right? Isn't that a waste of bits? Stefan: For these tracks, you can individually pause/resume to prevent that problem. Dominique: If the only problem here is telling javascript developers "don't do that then," I think we just need to document this somewhere. The complication of additional sync isn't worth it. EKR: Let's say I create two PCs and attach them to the different audio streams. Should they be synched? Stefan: Yes. EKR: I'm not sure I agree. Cullen: As far as I can tell, media stream is only a synchronization construct. Stefan: I think we want to sync between media streams. Cullen: Okay, so how do you signal streams that don't need to be synched? Stefan: What's the use case? Cullen: Whiteboards, for example. Adam: One scenario here is using multiple PCs to do simulcast. Stefan: I cover that on the next slide. Justin: Why are we sending both streams A and C (audio streams)? Stefan: We don't have any tools to tell developers not to do this. Justin: We can't stop people from writing bad programs. We might need to do more examples, but expecting that we'll get perfect behavior for imperfect scripts... we're going to have to do a lot of work. HTA: We have the tool for this. It's called "English," and we put it in the spec. Right now, we say streams are synched in a stream, and that they're not synched between streams. If we document this property and developers mess it up, it's their fault. Ted: Is there a driving use case other than "someone didn't read the spec"? If you had such a use case, I could understand this better. Stefan: Simulcast with two PCs, for example. Dominique: There will be bad programmers. Their programs will fail. We don't need to protect against all bad programming. As long as we have the tools to do things right, and have documented them, then our job is done. Jonathan: I understand that AVTCORE isn't here, but I'm worried that RTP defines sync by CNAM. How do browsers know to use synchronization via something other than CNAM? Cullen: My understandig (as RTCWEB chair) was that browsers fully intend to send CNAMs according to this scheme. Jonathan: How about receipt? Cullen: THey should match up. EKR: That's not going to work. Media streams are generated prior to receiving any RTP. Given that, we must be able to derive that two things are synched via SDP absent any RTP mechanism. If there's a conflict between SDP and RTP, we need to harmonize them. Jonathan: (missed this statement) EKR: My assumption is that browsers will treat things in separate MSIDs (or with no MSIDs) as if they are sepearte streams (or merge them into the same stream). Jonathan: If you have one audio and one video, they're probably supposed to be synchronized. EKR: (missed this statement) Cullen: Is there a draft for putting CNAMs in SDP? Jonathan: (RFC 5576), but no one is going to want to use it. Jesup: I think when you have legacy incoming SDP, the browser is going to need to stuff them all in the same media stream and sync them. You can wait for the CNAMs if you want to, but realistically you'll be covered just fine if you don't. Jonathan: I'm not sure there's any guarantee that synchronizing information has any relationship to each other if the streams have different CNAMs. Consider decomposed sources with clock skew. Martin Thompson: Consider that sync is going to act transitive. If you sync A with B, and C with D, then synching B with D will automatically synch A with B. Also, I think if you start to sync things that shouldn't be synched, you're going to end up with some surprising failures. Hadriel: Are you saying the media stream can override the CNAMs? Fluffy: I don't think we have consensus. I'd consider that unresolved. Hadriel: When these come in from gateways, they're not going to be organzied by CNAMs. Action Item for chairs: Determine How we handle CNAM, MediaStreams, MSIDs, CNAM changes, conference mixing with no CSRCs, and the related morass Stefan: I think we'll call that the end of this timeslot. EKR: MSID is the only thing we can use to assemble media streams. Lacking that, we'll need to treat them as all separate or all together. Hadriel: Can we pick one and standarize it? EKR: Sure. EKR: We'll have to assume that everything with the same MSID is in the same media stream. (Some proposal about how to handle CNAM/MSID mismatches, but it was too fast to follow). HTA: ... HTA: Another question is whether we should condition these on CNAMs, when we can get RTP packets without CNAMs are all. Ted: We need to consider whether we need to really nail down behavior from the corner case where a browser gets SDP from a non-browser, given that it's really a corner case. It's easier to say "if there's no MSID, assume they'll be treated separate." Jonathan: The ability to sync depends on receipt of a first CNAM anyway. Hadriel (responding to Ted): Wait, when we're gatewaying, we don't know how to put MSIDs in. This isn't going to be as much of a corner case. (conversation ensues about non-sync behavior being "close enough," whether MSIDs need to be mandatory, some consistency about what happens if they're missing, etc). Martin: I think we can handle them being separate in the absence of MSID. Hadriel: So you're saying that you're okay if they're separate? Martin: Yes. Hadriel: Me too. I don't actually care, I just want one or the other to be specified. Bernard: I argue that RTP streams with the same CNAM should sync even if we specify them as different streams. Tim: You can't assume that things that are not in the same media stream are not synched. Cullen: In the case of a single-stream voice/video phone using SIP, do we think the browser playout should be synchronized? Martin: What Ted said is that we need to MediaStreams, which may or may not be synched. If they turn out to be synched (for whatever reason), cool. But we don't make any promises. Cullen: But is there a requirement for the browser to tell the app that these things are in sync? Martin: No, I think the browser can just do that Hadriel: But are we going to require that they are synched if they have the same CNAM? Martin/Ted: No. Ted: They might be synched, but they are not guaranteed to be. It's an "underpromise and (potentially) overdeliver" situation. Cullen: That's not what Hadriel said. Jonathan: I think we're overpromising and underdelivering if we have two streams in the same stream but different CNAMs. If they're not the same CNAM, you can end up with "these streams are 40 years out of sync," since they can be using different clocks. That's probably bad. EKR: I would assume that you stop trying to sync them. There's no other way to imagine how you do that. Jonathan: What's the threshold for that? Martin: implementaion decision. Dan: What do you expect to happen if you have two media flows from wherever -- different devices, different PCs -- and then you say "synch these." What do you expect to happen? Starting from the point at which I put them together, I would expect them to advance at the same rate. Jonathan: Yes, that's what happens. The only time that won't happen is if you're trying to deal with sync skew. (Missed some of the discussion here) Hadriel: I could be very wrong, but it seems that the browser guys shouldn't want to leave this up to implementation. This would lead to interop failures, right? You'd have scripts that do different things, right? I think if we have two tracks with the same CNAM, we should guarantee that they'll be synched. Bernard: I think the RTP usage document already says that, right? Tim: Talking about advancing clocks at the same rate, that was what we first said about putting two things in the same media stream, right? Dan: Right, what I said was that a naïve coder would expect that putting two things together in the same media stream should be guaranteed to proceed at the same rate as each other. HTA: When I setup a stream, I have a jitter buffer and a playout clock. If we combine tracks from different remote sources (and it's natural to think we should be able to), some of these clocks won't run at the same rate, which means someone somewhere need to harmonize them (which introduces artifacts). You have to decide where the jitter buffer lives, you have to determine how you handle skew. Keith: Most of this conversation is in the perview of AVTCORE, not this conversation and not these working groups. We need to communicate any new requirements to AVTCORE. Cullen: I don't think we have any conclusions in need of being communicated. ________________ Trickle ICE 1. The point of trickle ICE is to optimize a characterisitic of ICE; right now, discovery is handled serially. The idea is that we can handle these steps in parallel. 2. First time around, we tried doing it in a very generic fashion, in paritcular agnostic to offers and answers. 3. We were told that we also need to specify how to do this in SIP. 4. We were told to expand on half-trickle 5. We were told to explain interop with ICE Lite implementations0. Changes 1. Now advertise support in a new attribute 2. New syntax for offer/answer with no addresses (0.0.0.0, port = 1) 3. Syntax for announcing end of candidates (a=end-of-candidates) EKR: API is already defined so we're just discussing wire syntax Adam: Make sure your syntax talks about IPv6 Emil: You'll still put IPv4, and then include v6 candiates. Fleming: (something about SDP syntax). Hadriel: End of candidates really means "this is my final SDP change." Emil: It really means that we're done sending candidates. Hadriel: It's really saying we're going to stop changing SDP. Emil: It could be just a different message to the other UA. Jonathan: I think the advantage to "a=end-of-candidates", is that you can start doing the ICE conclusion and/or declare failure (or success on a non-preferred transport, like TURN). Jonathan: Use port 9 rather than 1. (Discussion about the use of 0.0.0.0 which used to have different semantics) Adam: It would be nice if we use something different, like "IP6 ::" rather than "IP4 0.0.0.0" -- otherwise, you have to add special purpose code to existing SDP libraries. (Some discussion about how to handle with SIP, and that we need to include a reference to the SIP document) Jonathan: It's possible that you might need a new offer/answer before trickle concludes. You'll want to include the previously communicated candidates in the new messages, which would include the "a=end-of-candidates" if necessary. (Some disucssion around how long you run trickle versus when you do ICE restarts) 1. Used to say that you only do trickle if you know the remote party does trickle. Now say that you can do half-trickle if you don't know the remote end supports it. (Some very confusing discussion about 3PCC) Dale: If you send an offer, and get an answer, and then send a subsequent offer in the same dialog, you're not guaranteed that the subsequent offer goes to the same implementation. So you can't make any assumption about the capability of the thing receiving the new offer. Fleming: That seems to be very problematic. Dale: Yes, but there are solutions, and my draft discusses these. (Some clarifying questions about that mechanism) Fleming: Need clean separation for indication of support and indication of "I'm going to use this right now" (e.g., "a=trickle-ice:on"). Emil: So what are the semantics of saying "I can do trickle, but I'm not doing it right now?" Jonathan: It lets you indicate to the remote party that you can receive trickle. EKR: You can detect whether it's being used by the presence of candidates. Jonathan: I think what I was saying before about 3PCC -- if you're using ICE, it won't work anyway without ICE restart -- so I think you're okay. 1. Half Trickle Mostly meant for SIP. Involves offerer sending all attributes in the offer, but getting candidates trickled back by the answerer. (From that point, we can use full trickle in both directions). OPEN ISSUES 1. Do we really need the stream index? EKR: All we need is some way in the API to indicate where they go. Emil: So this should be taken care of by the protocol that uses Trickle ICE. Emil: Currently, we say to use either index or mid. When would we ever use just index? (I must have missed something because this seems completely non-sequitur) Fleming: We need to define a way to use this in offer/answer. Cullen: Do we have a way to handle this if there are no mid's at all? (Murmurs from the room seem to think so). Some discussion about ordering of attributes, and whether "end-of-candidates" needs an explicit index. Justin: We could put both mid and m-line index, but we're proposing just mid for simplicity. Does that work? EKR: Right now, I read it as saying you MUST have an m-line index, and SHOULD have a MID. Justin: Is there any reason to do trickle with something that doesn't do mid? Cullen: We can mandate that all trickle users include a mid. (People seemed to like this) Emil: We define these as m-line attributes. Should they also be session-level? HTA: Is there a reason why we have "end-of-candidates" rather than "more-candidates-pending"? It would seem to be less disruptive if the final SDP looks "normal." Justin: We might have a failure case that causes "end-of-candidates" to be sent but we didn't know we were done prior to the failure. Fleming: Is this SDP or not? Emil: This isn't SDP. It's what we send. EKR: Why isn't this real SDP? Emil: Okay, so you can consider this SDP. Candidates can go out-of-band also. Fleming: So we're defining "end-of-candiates" in SDP? Emil: Yes. Fleming: Session Level attributes are the source of all evil in SDP. Don't do it. Keep it a the media level. Jonathan: The ICE algorithms do have cross-m-line functionality. We might not want to treat m-lines independently, since there are whole-SDP-session semantics at play. (Sidetrack argument involving extreme esoterica regarding ICE pacing here). Jonathan: I vote for putting m=end-of-candidates at the session level meaning all media sections are done. Emil: It's difficult to put it in there for RTCWEB. Ted: Sure, but we should say RTCWEB always sends it for each media section. EKR: Right now, keep in mind that we don't have events for individual media streams, only for the whole session. Ted: We can fix that, right? EKR: Right. EKR: Oh, crap. This won't work. (I didn't understand why, but I assume EKR will propose a solution to the list) 1. ICE lite and Candidate Signaling Jonathan: We think this just works. Justin: We should include some examples to show how this works. 1. New Candidates after ICE completion: Do we do this with ICE restart? or should we continue to trickle? Jonathan: It needs to be ICE restart. I might have already realeased my TURN server. 1. SIP Usage for Trickle ICE SIP applications will alwasy do half-trickle unless otherwise configured. Trickle candidate will be done with INFO package. Jonathan/Adam: This should be application/sdpfrag not application/sdp Fleming: We need to be careful here; we're defining two different means to communicate SDP information. This can go wrong quickly. Cullen: Always think of this as a diff, not an alternative mechanism. That saves you from Fleming's tar pit. Also, we could define this in terms of a patch syntax. Hadriel: Are you going to talk about when this can happen? Emil: Yes! Right now! (check slides for diagram) Christer: We're changing the SDP body. Do we change the SDP version number or no? Christer: Also, we need to ensure that the 180 is sent reliably. Make sure the INFO is sent only when it can be sent. Paul: There's also the issue of when INFO can be sent. (Cullen explores forking, with at least one proposal to include ufrag/passwd in trickle ICE fragments) Emil: Currently, we say that the ICE re-invite only does the first checklist, and activates the other ones only after it completes. I think we can ignore that. EKR: No, you'll end up with a storm of ICE checks, which isn't really what we want. What you should do is: if you get a candiate, check to see if anything is running. If not, start the first checklist to which the candidate belongs. If you do that, you behave much more like ICE, and things like suicide packets are much more likely to succeed. Martin: That's one way to do it. Or you could check just one pair (of the same foundation) for each checklist. EKR: That's a big change in behavior. Martin: but it would work better. EKR: Given what I expect to see -- all candidates arriving in a group -- you're going to serialize all the checks. That's how ICE is meant to behave. Emil: Doesn't it activate all of the checklists after the first one completes? EKR: No. (Further conversation around how this is all supposed to behave.) Jonathan: The principle should be that when a candiate comes in, if it would have been unfrozen, then unfreeze it. Otherwise, it should be frozen. EKR: We really need to sit down, work through this, and write it down. Action item: EKR and Martin to propose text Jonathan: What happens if the other side never sends end-of-candidates? Emil: This isn't any worse than normal ICE. Cullen (Chair): Please send something to the list describing the problem and proposed solutions. Fleming: There are two things here. One should probably go to DISPATCH.