[rtcweb] jsep 01 comments

"Ejzak, Richard P (Richard)" <richard.ejzak@alcatel-lucent.com> Thu, 05 July 2012 19:42 UTC

From: "Ejzak, Richard P (Richard)" <richard.ejzak@alcatel-lucent.com>
To: "rtcweb@ietf.org" <rtcweb@ietf.org>
Thread-Topic: [rtcweb] jsep 01 comments
Thread-Index: Ac1a5lzIi/mqdqSiSf2H00RgWQKm0w==
Date: Thu, 05 Jul 2012 19:42:53 +0000
Message-ID: <03FBA798AC24E3498B74F47FD082A92F177082F7@US70UWXCHMBA04.zam.alcatel-lucent.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative; boundary="_000_03FBA798AC24E3498B74F47FD082A92F177082F7US70UWXCHMBA04z_"
MIME-Version: 1.0
Subject: [rtcweb] jsep 01 comments
Precedence: list

I agreed to review jsep 01 during the interim. Please see my comments below against version 01 of the jsep draft.

In the paragraph before figure 2 in section 4.2, the text justifies the notion of multiple outstanding offers and answers based on RFC 3264. This is incorrect. RFC 3264 very clearly describes (in section 4) how each offer cannot be updated until an answer is received and that an answer is final. This should be made clear in jsep. The actual justifications for multiple outstanding offers and answers should be made clear and their use within the context of RFC 3264 needs to be made clear. Specifically within the context of SIP, there is no justification or support for multiple offers, and the only justification for multiple answers is to provide a limited mechanism for dealing with some SIP forking cases.

The 2nd paragraph of section 4.5 should make clear that XMPP is an example application supporting the capability (ICE trickling) and SIP is not.

In section 4.7, it should be made clear that SIP forking is being considered (with clear references) and any XMPP analogs, if they exist. Obviously section 4.7.2 should indicate cloning as a possible strategy to fully address parallel forking that is not currently supported. The text should describe that in the absence of cloning ICE and DTLS may not begin until a "winning" target is selected, thus potentially delaying the start of media flow, whereas ICE and DTLS could occur in parallel were cloning supported.

Section 4.8 needs updating. It is not clear from the description if a new PeerConnection needs to be established (or how to re-establish the old one) and how to ensure that the new PeerConnection will be compatible with the old LocalDescription. I think the description should say that PeerConnection is recreated using the old parameters and (reconstructed?) MediaStreams. Something is also needed to describe how to ensure the continued availability of the old MediaStreams. Rehydration will require end-to-end offer/answer to recreate the PeerConnection anyway, and I'm not sure of the value of attempting to reuse the old LocalDescription since it may not make sense to force some of the old attributes. Some discussion of how to ensure this is needed if the concept of reusing the old LocalDescription (and MediaStreams) is retained.

In section 5.1.1, the first paragraph describes createOffer the first time it is invoked but not necessarily its behavior for subsequent invocations. The last sentence of the 2nd paragraph is inconsistent with the last sentence of the first paragraph under some circumstances. The local description used to populate the offer once the session is established, but may come in three flavors (noting that an offerer may later become an answerer for the same PeerConnection and vice versa): 1) the local description set by the prior offerer before receiving a final answer (thus reflecting all supported codecs and capabilities according to the constraints); 2) the local description set by the prior answerer (thus reflecting only the selected codecs and capabilities); and 3) the local description set by the prior offerer but where createOffer occurs after receiving final answer to the previous offer (in this case the offer may include codecs and capabilities already released by the browser).

The last paragraph of section 5.1.1 describes that in case 3) the offer should be modified to reflect the "current state of the system". It's not completely clear if this is intended to describe the current resources allocated to the PeerConnection or the potential resources that could be allocated (i.e., the full set of available codecs and capabilities). It is similarly unclear for case 2) if the offer is to be changed and how.

The last sentence of the second paragraph of section 5.1.1 is also inconsistent with the last paragraph of the section and it is not clear which one takes precedence.

This potential inconsistency in the behavior of createOffer is problematic to an application since the range of capabilities reflected in the SDP is state dependent. The result of createOffer should not depend on whether the browser was previously offerer or answerer and it should be possible to request either current capabilities or the complete list of potential capabilities for each unchanged m line in the offer using the constraints parameter.

I was asked during the interim to justify the need for a "full" offer as compared to an offer that just reflects the current capabilities. There are many examples in SIP where a node is sent an "empty re-INVITE" or a "REFER with REPLACES" during a session to trigger the node to send a new SDP offer to renegotiate media. This may be triggered due to an interface change, a network server (e.g., acting as B2BUA) moving the media to another endpoint such as a conference server, etc. In all these cases there is no guarantee that the new target will support the same codecs or have the same capabilities, so the best way forward is to create a "full" offer to make it most likely that an acceptable media configuration can be achieved in a single offer/answer transaction. There are other cases where it may be acceptable to send a "minimal" offer, but only the application can determine which one is needed.

The 3rd paragraph of 5.1.2 says that createAnswer should also reflect the "current state" of the system. It is very confusing to use the same words to describe createOffer and createAnswer since they should not mean the same thing. The text should be clarified to distinguish between these two cases.

There is a great need for a new section (possibly a subsection of 4) that describes the relationship between MediaStreams and media lines. An rtcweb MediaStream is not the same as an RFC 3264 media stream, thus causing endless potential for confusion (I admit to being one of the victims). An RFC 3264 media stream is more akin to a MediaStreamTrack, and even that is not completely accurate since we will allow multiple MediaStreamTracks per m line. How does the browser decide how to allocate MediaStreamTracks to m lines? The easy solution is to assign each track to a separate m line but this is wasteful when multiple tracks carry the same media type. But not all tracks of the same media type should be forced to use the same m line since it may be necessary to negotiate different capabilities for these tracks. It must somehow be made clear to the browser which tracks can be combined on an m line and which cannot so that it can generate an appropriate offer with the proper number of m lines.

In section 5.1.4, it should be pointed out that the need to support both old and new local descriptions means that the PeerConnection must in some sense be "cloned" since there may be completely different remote candidates and codecs selected for use with the new description and both configurations need to be simultaneously supported for an interim period. It is also not explained how to remove the new configuration if the 2nd offer/answer fails. I do not think that the state diagram supports a transition from offer state or pranswer state back to stable state without processing a valid answer. Note this is also another implicit meaning of pranswer that should be described for subsequent offer/answer transactions - both the old and new configurations need to be supported until receipt of final answer or failure is indicated.

Sections 5.1.6 and 5.1.7 should clarify the relationship between the local/remote description and the actual resources allocated. One of them usually reflects the configured resources while the other is generally a superset. If this is not the intended semantic, then that should also be clarified.

In section 5.2, I believe the RTP header extension attribute in RFC 5285 is a=extmap rather than a=rtphdr-ext.

Richard

[rtcweb] jsep 01 comments Ejzak, Richard P (Richard)
Re: [rtcweb] jsep 01 comments Justin Uberti
Re: [rtcweb] jsep 01 comments Stefan Hakansson LK