Re: [rtcweb] Comments on use case draft

Randell Jesup <> Tue, 30 August 2011 15:07 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8DCBF21F8C78 for <>; Tue, 30 Aug 2011 08:07:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id tlzDb9QMg5ss for <>; Tue, 30 Aug 2011 08:07:47 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 617BE21F8B04 for <>; Tue, 30 Aug 2011 08:07:47 -0700 (PDT)
Received: from ([] helo=[]) by with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <>) id 1QyPwA-0008Dq-G9 for; Tue, 30 Aug 2011 10:09:14 -0500
Message-ID: <>
Date: Tue, 30 Aug 2011 11:06:40 -0400
From: Randell Jesup <>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
References: <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
Subject: Re: [rtcweb] Comments on use case draft
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 30 Aug 2011 15:07:48 -0000

On 8/28/2011 4:07 PM, Stephan Wenger wrote:
> 4. Section This use case is not described in sufficient detail.
> At least two scenarios are possible.  First, both front and rear camera
> send individual video streams (potentially at different resolutions), and
> the PIP mixing happens in the receiving browser.  This would be a user
> interface issue and no mechanisms need to be specified in the IETF beyond
> being bale to send more than one video stream (though there may be need
> for related API work).  Second, the PIP mixing happens in the sending
> phone, and only one stream is being sent.  In this case, I believe not
> even API work is necessary.
> Suggest to reconsider whether this use case is relevant enough for being
> kept.  Multi-camera systems being able to send coded samples from both
> cameras simultaneously are rather exotic today (only telepresence rooms
> come to my mind).
I think this directly talks to a common case that would interest many news
organizations: personal news reporting.  CNN iReport, even local news channels -
they love having users act as reporters-on-the spot, and having the video from
both cameras is a big win for them (and for the producers working from such
footage, who could swap the PIPs or suppress one or the other on the fly).
And most newer Android phones and iPhones have two cameras.

Could I live with losing this use-case?  Yes, with pain.  I do want to support
multiple streams so you can have the user and a local video (either a file, a
camera, or a video encoding of a desktop or window).  I'll note this usecase
only hits the two-cameras part of what I'd like to see, but I don't know
we need to go into that detail here (i.e. where other than a camera a stream
can come from is not something we need to mandate here - I think we just need
to call out the need for two streams).

> 5. Section Why are the sending peers restricted to mono audio?
> Spatial arrangement is not very complex for stereo as well...
> 6. Section What's really necessary here is a mechanism that
> allows a user to tell a browser that VERY tight cross-signal sync and VERY
> low delay is required, which may trigger different jitter buffer handling
> and such.  Beyond that, I believe that audio codec negotiation may be
> helpful.  Audio professionals (like musicians) are somewhat more picky
> when it comes to these technology selections than normal users.  I would
> not be surprised if we would learn that there is a real market requirement
> for uncompressed or lossless audio if this use case takes off.
Distributed music band - that needs TIGHT N-browser N-stream sync and
VERY LOW (and virtually constant i.e. lan) delay.  I do not believe this is
likely to be technically feasible such that users would accept it.

It would likely need ultra-low delay jitter buffers, much lower packetization
sizes, uncompressed audio, etc.

Distribution of music to multiple playback stations: more possible; the sync
requirements are somewhat relaxed, and the ultra-low delay is relaxed much more.

Let's drop this one, please.

> Section I have a number of issues with this use case.
> First, in contrast to most other use cases, this one enters solution space
> quite prominently.  That wouldn't be an issue for me if the solution my
> employer is favoring were mentioned here, but it is not :-(.  To cure my
> immediate concern, one suggestion would be to remove references to
> simulcast and/or add references to spatial scalability.  However, perhaps
> it's better to describe the behavior of the multipoint system in terms of
> user experience rather than technology choice.
Generally I agree.  The use case can be more user-oriented.  What are we targeting
for the space here: a conference system tightly tied to an application on the
browser, or a generic conference system which would allow better operation when
you have "interop" calls between services - i.e. can someone on Facebook join
a rtcweb Hangout hosted on Google?  These choices would drive different requirements
to be derived from the usecase.

Also, we're describing things an application *could* do with the system, not the only
or even preferred way to do a conference.  This usecase states that someone *could*
build a "dumb" conference server that does no re-encoding, just stream selection and
forwarding.  It doesn't prevent a conference server that re-encodes, nor a conference
system using SVC or equivalent that subsets the incoming stream.

The application could request that a second smaller stream be sent, though obviously
this presumes the implementation, so it would be more tied to the conference server
implementation.  I'm wondering if there would be a good way to say "find a way to
deliver a low and high resolution image", and let the system (rtcweb) figure out how
to do it given the shared codecs available.  (I.e. SVC in a particular config if both
the conf server and browser support it, simulcast streams if they don't.)

> Second, why is audio mixed to stereo and not to something else, such as
> 5.1?
Remove reference to the number of channels.  That's handled via negotiation as

> Third, the security stuff is not in any way technically bound to the rest
> of the use case, so I would farm it out into its own use case, and/or
> mention it as a "generic" feature... Remarks like "it is essential that
> the communication cannot be eavesdropped" would apply to pretty much all
> use cases, right?
Per someone else's comment: we can drop this and add a separate usecase for
planning a bank robbery.  :-)  Better, though, might be a lawyer-client
conversation or a secret agent. :-)

> 7. Missing use case:
> It is my understanding that for regulatory compliance, in many developed
> countries, there will be a need for an E911 type of service *IF* the
> solution allows to "dial" an E.164 phone number.  I remember a controversy
> involving SKYPE in ca. 2005, and also having read about recent FCC
> hearings about this issue; for example,
> on-extending-e911-rules-to-oneway-outboundonly-voip-improve-location-capabi
> lity-of-inteconnected-voip/.
> If there is a reasonable expectation that a webrtc service with outbound
> dialing capability in E.164 number-space requires E911 handling, then it
> does not make sense to stick our collective heads in the sand and ignore
> the issue.  I believe there is such an expectation; surely during the
> lifetime of an webrtc solution, but probably even during its introducer
> phase.
Agreed.  I've dealt with FCC rulings (and I'm sure other countries will have
similar and possibly even stricter requirements).  Basically, it will be very
likely that a provider will want to connect rtcweb to the PSTN (even if through
a gateway), and once that's done they'll need to support E911 services.

I could even see future expansion of the technologies for emergency calling;
you're seeing this now with emergency centers looking to support text messages.
So you might have emergency centers with rtcweb support directly or via a
translation/forwarding service (as voip is generally done today).  There is
significant advantage to an emergency center to be able to support video calls,
for example (though obviously there would be issues surrounding that).

> If E911 is relevant in this sense, then this issue needs to be addressed
> in section 4.3.1, 4.3.2, and perhaps 4.2.5.
> I understand that the editors did not address those use cases yet based on
> (presumably) lack of consensus, but I fear that IETF consensus is not the
> only relevant factor here.
> (I could mention "legal intercept" in the same context, but suggest to
> focus on emergency calls first, because a) they are easier to handle, b)
> more widely applicable, and c) generally agreed to be a useful thing, and
> therefore not quite as politically loaded.)

Yes.  :-)

Randell Jesup