Re: [rtcweb] Review of draft-ietf-rtcweb-use-cases-and-requirements-04.txt

Dan,

many thanks for great input! I've updated the document, and will shortly 
announce a new version (-05).

I have not incorporated all your proposals, see inline for details.

Thanks,
Stefan

On 2011-09-08 11:18, Dan Burnett wrote:
> A while ago I promised a full read-through review of the use cases and requirements document [1], primarily from the API perspective, but I have included other comments as well.
> The comments follow the order of the document.  Some are editorial, and some are more substantive.
>
>
> Section 4:  As a general comment, the use cases occasionally stray more into implementation rather than just being worded in terms of user needs.  This has driven some of my wording change suggestions below.
>
>
> 4.2.1.1:  The wording is a bit unclear as to whether this use case is for only a single peer-to-peer connection or for multiple connections.  In particular, it points out that for a session there is a self-view and a remote view, but it's not clear at that point whether there might be *multiple* remote views simultaneously in the session.  However, later on in this use case it states that "Any session participant can end the session at any time."  Then there are what appear to be examples of different users, but it is not clear whether it only needs to be possible for each of these kinds of users to be supported (singly), or whether it must be possible to support communication with all simultaneously.
>
> Since there is a separate use case for multiparty video communication (4.2.7), I believe this use case should be cleaned up a bit.  I suggest the following text for this use case:
>
> ******
> Two or more users have loaded a video communication web application into their browsers, provided by the same service provider, and logged into the service it provides.  The web service publishes information about user login status by pushing updates to the web application in the browsers.  When one online user selects a peer online user, a 1-1 video communication session between the browsers of the two peers is initiated.  The invited user might accept or reject the session.
>
> During session establishment a self-view is displayed, and once the session has been established the video sent from the remote peer is displayed in addition to the self-view.  During the session, each user can select to remove and re-insert the self-view as often as desired.  Each user can also change the sizes of his/her two video displays during the session.  Each user can also pause sending of media (audio, video, or both) and mute incoming media.
>
> It is essential that the communication cannot be eavesdropped.
>
> Either session participant can end the session at any time.
>
> The two users may be using communication devices of different makes, with different operating systems and browsers from different vendors.
>
> One user has an unreliable Internet connection that sometimes loses packets and sometimes goes down completely.
>
> One user is located behind a Network Address Translator (NAT).
> ******

I updated according to your proposal.

>
>
> 4.2.3.1:  I recommend some minor editorial changes, so that the second paragraph reads
>
> ******
> The communication device used by one of the users has several network adapters (Ethernet, WiFi, Cellular).  The communication device is accessing the Internet using Ethernet, but the user has to start a trip during the session.  The communication device automatically changes to use WiFi when the Ethernet cable is removed and then moves to cellular access to the Internet when moving out of WiFi coverage.  The session continues even though the access method changes.
> ******

I updated accordingly.

>
>
> 4.2.4.1:  "previos" ->  "previous".  Also, the first use of "QoS" should define the term, as in "Quality of Service (QoS)".

Fixed.

> Actually, QoS is more a derived functional requirement than a use case, especially when that specific term is used anywhere near IETF folks.  If what the user wants is that the call continues at best available quality (to the possible detriment of other users of the same cell/dsl/whatever), we should say so. It may be the best way to do this lies in the codec or protocol and not using existing QoS methods.

I tend to agree. Maybe this part should be removed. It was originally 
added after input from Cullen (who wanted to be able to use QoS support 
in residential GWs.

>
>
> 4.2.5.1:  We should clarify that *in this use case*, the service providers are choosing to exchange no more information about the users than what can be carried using SIP.  In other words, this is not suggesting that all RTCWeb/WebRTC web application service providers must restrict themselves only to exchanging information that can be carried via SIP (whatever SIP means in this situation).  For example, in general the interoperability of sites could be done though any IM protocol, e.g., combined with, say, oauth for identity. We should not be mandating or preferring (even by implication) any specific protocol.  If websites choose to export presence and identity to support interoperability that is up to them and does not necessarily require that the RTCWeb API provide such a mechanism.

I agree fully to the above. However, I did not have the energy to update 
the wording in the document (now it basically says that more work is 
needed to define).

> I almost think that this implies a new, more precise requirement that the Web API MUST NOT prevent two webapps that happen to choose to peer with SIP from peering.  That makes clearer what our baseline minimum is without restricting the peering mechanisms of all webapps.  I say "clearer" rather than "clear" because "peer with SIP" is itself not very precise, but I still think it's better than what we have now.

Do you mean a new requirement, or should F24 be re-phrased?

Also, for clarity, my view is that what can be exchanged using SIP is 
not limited to the signaling messages PeerConnection produces/consumes. 
Only certain messages (related to establishing streams and setting up 
connections) make sense to PeerConnection object, others can be 
webapp-webapp messages (in this case carried by SIP) that doesn't 
originate from, and are never fed into, the PeerConnection.

>
> I suggest a minor rewording of
> "Each web service publishes information about user login status for users that have a relationship with the other user; how this is established is out of scope."
>    to something more concrete, e.g.,
> "For each user Alice who has authorized another user Bob to receive login status information, Alice's service publishes Alice's login status information to Bob.  How this authorization is defined and established is out of scope."

This part I've updated.

>
>
> 4.2.6.1:  "thumbnail ot" ->  "thumbnail of".  "can not" ->  "cannot"

Updated.

>
>
> 4.2.7.1:  "simple video communication service" needs to reference 4.2.1.

Right. Updated.

>
>
> 4.2.8.1:  The description should begin with "This use case is based on the previous one."  Also, "can not" ->  "cannot" and "sound of the tank, that file" ->  "sound of the tank; that file".

I updated accordingly.

> More substantially, the note in this section strongly suggests that the WebRTC/RTCWeb groups must be responsible for the mixing of sound objects with streams before rendering.  It might be clearer to state that our group's work MUST NOT prevent this and in fact should work with other groups' definitions of HTML5 audio rendering.

Yes it does. And the previous section suggests that webrtc is 
responsible for making sure that audio streams can be spatialized.

I am not sure these are our tasks; the reason for putting these 
requirements into the document in the first place was to make sure that 
audio processing functions developed by other groups (e.g. the W3C Audio 
WG) can be applied also to the MediaStreams defined by WebRTC.

>
>
> 4.3.1.1:  "mobile phone used" ->  "mobile phone is used".  "can not" ->  "cannot".
> This use case is underspecified.  What does it mean for a user to "place and receive calls in the same way as when using a normal mobile phone"?  My mobile phone vibrates when I receive a call, and I can dial it by pressing and holding a digit on the keypad.  I don't even have a SIP softphone on my desktop that can do either one.  The login must also allow the user to manage their account, pay bills, add services, etc.  More interestingly, it should be possible to write a portal web app that, once the user is logged in, does not require the user to submit an additional set of credentials to access the phone functionality.

I agree. I added some clarification of what was meant as a note. What do 
you think?

>
>
> 4.3.2.1:  I don't believe this use case goes far enough.  The phone experience should be sufficiently embedded in the page that the user's context can be passed with the call, possibly resulting in a deep dial into an IVR tree or a customer service representative not having to ask questions that the user has already answered at the website level. The key here is that we should be aspiring to a user experience that is *better* than that of a PSTN call, not just equivalent.

Could you help produce more text to this one?

>
>
> 4.3.3.1:  "can not" ->  "cannot".  "All participant are authenticated" ->  "All participants are authenticated". "There exists several" ->  "There exist several".  "one low resolution stream, the" ->  "one low resolution stream, and the".  "c) each browser" ->  "or c) each browser".  "just an high" ->  "just a high".  "reslution" ->  "resolution".

Fixed.

> Also, we should probably note in this use case that the spatialization could not only happen as part of the server-side mixing but also by having the server tag the stream with spatialization info and having the browser render it.

As the use-case is now written (only one audio stream from server to 
browser), it seems that the spatialization must be done in the server. 
Then it is a question if the use-case should be changed to allow for 
client side mixing/spatialization.

>
>
> F2:  "in presence of" ->  "in the presence of"

Fixed

>
>
> F5:  ditto

Fixed

>
>
> F8:  "any more" ->  "anymore"

Fixed.

>
>
> F15:  I think this is venturing out of scope.  Perhaps a better phrasing is "The webrtc browser component MUST interoperate with other HTML5 methods for processing and mixing sound objects (media that is retrieved from another source than the established media stream(s) with the peer(s) with audio streams)."

I agree it is venturing out of scope for these WGs, as is the panning 
part of F13. But as phrased it is a requirement on the Browser, and then 
I think it is correctly phrased.

>
>
> F18:  While support for a minimum common codec is important, requiring it to be commonly supported by existing legacy telephony services is technically only a nice-to-have feature.  One might consider gsm610 as an alternative, for example.

I guess the codecs will be discussed a lot in the coming months. You can 
go either way, from saying that "a minimum common codec is a nice to 
have feature" (you could always transcode) to requiring codecs and even 
certain profiles (AVP/AVPF/SAVPF discussion) so that interop can be 
accomplished without any GW at all. I left it as is for the time being.

>
>
> F19:  The first letter needs to be capitalized.

Done

>
>
> F24:  "carried in SIP" is not sufficiently precise.  More clarity here might improve some of the discussions we are currently having.

I agree. Harald supplied this requirement, and it would be good if he 
could supply a more precise requirement (along with text for 4.2.5).

>
>
> F26:  "in presence of" ->  "in the presence of"

Done.

>
>
> General comment about all of the API requirements in section 5.3:  they are not written as API requirements, but as *web application* requirements.  Since many of the requirements on the web application could be met through means other than the WebRTC API, it is easy for people to agree with the requirement but strongly disagree on whether the API needs to be the *mechanism* by which the requirement is satisfied.  Although I have not reworded all of the requirements below, I think it would be much clearer if we only wrote the requirements that the Web API itself must satisfy as "The Web API MUST ...".  For example, "The Web API MUST inform a web application when a stream from a peer is no longer received."

I think this makes sense. I tried to update the document accordingly.
However, I did not update A18 since that requirement to me is partly on 
the API, but can partly be solved without API involvement. I don't want 
to forego this discussion by changing the requirement.

> I suspect that this will help make clear where we disagree on which requirements must be addressed by the Web API itself and which must merely not be prevented by the Web API (and thus could be satisfied external to the WebRTC API).
>
>
>
> A8 and A10:  It would be good to clarify here somewhere what the difference is between pause/resume and cease/start for a stream.

I think, given the current API that it would perhaps be better to put it 
like "it must be possible to mute streams locally. The mute/unmute state 
must be preserved when a stream is sent to a peer" and remove 
pause/resume. I've changed along these lines in the document now. It 
also means that A8 is removed.

>
>
> A14:  As written this is not entirely in scope.  Perhaps the following phrasing would be more accurate?
>
> "The Web API MUST NOT prevent panning, mixing, and other processing for individual streams."

I did nothing to this at this time. I think it must be sorted out 
whether what parts of it that is in scope.

>
>
> A15:  This requirement is too specific in terms of how identifiers are shared.  Would the following perhaps be more accurate?
>
> "For each stream, the Web API MUST provide to both parties of the communication an identifier for the stream that is a) the same at both ends, b) serializable, and c) unique relative to all other stream identifiers in use by either party."

I re-wrote this requirement, what do you think?

>
> The word "serializable" is not exactly correct, but the idea I'm trying to convey is that the identifier can safely be passed from one party to the other and back again, via WebRTC calls or otherwise.
>
>
> A16:  A minor nit here -- we probably should not use the word "datagram" at this stage because of its implementation implications.  What about "In addition to the streams listed elsewhere, the Web API MUST provide a mechanism for sending and receiving isolated discrete chunks of data."

Correct.

>
>
> A17:  Another minor nit -- presumably this only applies when the signal is audio.  Maybe we could reword as "For streams of type audio, it MUST be possible for the web application author to indicate, via the Web API, when the stream is speech."

At some other place I used stream component. I used it here again.

>
>
> 7.2:  All but the last paragraph here should be written as requirements in section 5.2, not in the security considerations section.  They need to be not security afterthoughts but primary requirements for implementations.
> Additionally, I think we should be more explicit about consent revision to include revocation, i.e., "The browser is expected to provide mechanisms for users to revise and even completely revoke consent to use device resources such as cameras and microphones."
> Along the same lines, I believe we also discussed at the WebRTC meeting in Quebec that the browser should provide a user-visible security indicator (such as a padlock) indicating the encryption level of the session.  Maybe this should be a requirement?
> Also, "browser is needs" ->  "browser needs".

I left this for further discussion (except for adding the revocation 
part) as it seems the security study has not gotten that far yet. I 
think at a later stage we will move this into section 5.2.

>
>
> 7.3:  This should be a requirement in section 5.3.

I don't agree; this is currently a requirement (of sorts) on the 
_application_, not on the browser or API.

>
>
> Thanks,

Many thanks for useful input.

>
> -- dan
>
> [1] http://www.ietf.org/id/draft-ietf-rtcweb-use-cases-and-requirements-04.txt
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb