Re: [rtcweb] Comments on use case draft

Dzonatas Sol <> Tue, 30 August 2011 21:00 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id D6B7821F8F1C for <>; Tue, 30 Aug 2011 14:00:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -3.794
X-Spam-Status: No, score=-3.794 tagged_above=-999 required=5 tests=[AWL=-0.795, BAYES_00=-2.599, J_CHICKENPOX_102=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9cS9k3AFgSZ4 for <>; Tue, 30 Aug 2011 14:00:36 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 7F41921F8F0D for <>; Tue, 30 Aug 2011 14:00:36 -0700 (PDT)
Received: by gwb20 with SMTP id 20so16247gwb.31 for <>; Tue, 30 Aug 2011 14:02:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=7BsHUczahu7mZooVtPYHafd906Kdmu63qFA0lSholhU=; b=CYfTybk4lVrjUAFVfJHRGJgeLmcu9QxahCNcEf0gCHxiW45mq8vhhi3xwJCEh/dtxv 4+jIPR03tAP/no4ImLWZqY5SsliMxZpKCUOQqf9IhN5xHSuQx/5Zqerv8i1tpqp2tVdo EYJFejVmmM9MMyM7MhfvBq6137EmSERsyqv4Y=
Received: by with SMTP id u20mr6946447ick.39.1314738123944; Tue, 30 Aug 2011 14:02:03 -0700 (PDT)
Received: from [] ([]) by with ESMTPS id um3sm3057856icb.7.2011. (version=TLSv1/SSLv3 cipher=OTHER); Tue, 30 Aug 2011 14:02:01 -0700 (PDT)
Message-ID: <>
Date: Tue, 30 Aug 2011 14:03:39 -0700
From: Dzonatas Sol <>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110505 Icedove/3.0.11
MIME-Version: 1.0
References: <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [rtcweb] Comments on use case draft
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 30 Aug 2011 21:00:38 -0000

On 08/30/2011 08:06 AM, Randell Jesup wrote:
> On 8/28/2011 4:07 PM, Stephan Wenger wrote:
>> 4. Section This use case is not described in sufficient detail.
>> At least two scenarios are possible.  First, both front and rear camera
>> send individual video streams (potentially at different resolutions), 
>> and
>> the PIP mixing happens in the receiving browser.  This would be a user
>> interface issue and no mechanisms need to be specified in the IETF 
>> beyond
>> being bale to send more than one video stream (though there may be need
>> for related API work).  Second, the PIP mixing happens in the sending
>> phone, and only one stream is being sent.  In this case, I believe not
>> even API work is necessary.
>> Suggest to reconsider whether this use case is relevant enough for being
>> kept.  Multi-camera systems being able to send coded samples from both
>> cameras simultaneously are rather exotic today (only telepresence rooms
>> come to my mind).
> I think this directly talks to a common case that would interest many 
> news
> organizations: personal news reporting.  CNN iReport, even local news 
> channels -
> they love having users act as reporters-on-the spot, and having the 
> video from
> both cameras is a big win for them (and for the producers working from 
> such
> footage, who could swap the PIPs or suppress one or the other on the 
> fly).
> And most newer Android phones and iPhones have two cameras.
> Could I live with losing this use-case?  Yes, with pain.  I do want to 
> support
> multiple streams so you can have the user and a local video (either a 
> file, a
> camera, or a video encoding of a desktop or window).  I'll note this 
> usecase
> only hits the two-cameras part of what I'd like to see, but I don't know
> we need to go into that detail here (i.e. where other than a camera a 
> stream
> can come from is not something we need to mandate here - I think we 
> just need
> to call out the need for two streams).
>> 5. Section Why are the sending peers restricted to mono audio?
>> Spatial arrangement is not very complex for stereo as well...
>> 6. Section What's really necessary here is a mechanism that
>> allows a user to tell a browser that VERY tight cross-signal sync and 
>> low delay is required, which may trigger different jitter buffer 
>> handling
>> and such.  Beyond that, I believe that audio codec negotiation may be
>> helpful.  Audio professionals (like musicians) are somewhat more picky
>> when it comes to these technology selections than normal users.  I would
>> not be surprised if we would learn that there is a real market 
>> requirement
>> for uncompressed or lossless audio if this use case takes off.
> Distributed music band - that needs TIGHT N-browser N-stream sync and
> VERY LOW (and virtually constant i.e. lan) delay.  I do not believe 
> this is
> likely to be technically feasible such that users would accept it.
> It would likely need ultra-low delay jitter buffers, much lower 
> packetization
> sizes, uncompressed audio, etc.
> Distribution of music to multiple playback stations: more possible; 
> the sync
> requirements are somewhat relaxed, and the ultra-low delay is relaxed 
> much more.
> Let's drop this one, please.
>> Section I have a number of issues with this use case.
>> First, in contrast to most other use cases, this one enters solution 
>> space
>> quite prominently.  That wouldn't be an issue for me if the solution my
>> employer is favoring were mentioned here, but it is not :-(.  To cure my
>> immediate concern, one suggestion would be to remove references to
>> simulcast and/or add references to spatial scalability.  However, 
>> perhaps
>> it's better to describe the behavior of the multipoint system in 
>> terms of
>> user experience rather than technology choice.
> Generally I agree.  The use case can be more user-oriented.  What are 
> we targeting
> for the space here: a conference system tightly tied to an application 
> on the
> browser, or a generic conference system which would allow better 
> operation when
> you have "interop" calls between services - i.e. can someone on 
> Facebook join
> a rtcweb Hangout hosted on Google?  These choices would drive 
> different requirements
> to be derived from the usecase.
> Also, we're describing things an application *could* do with the 
> system, not the only
> or even preferred way to do a conference.  This usecase states that 
> someone *could*
> build a "dumb" conference server that does no re-encoding, just stream 
> selection and
> forwarding.  It doesn't prevent a conference server that re-encodes, 
> nor a conference
> system using SVC or equivalent that subsets the incoming stream.
> The application could request that a second smaller stream be sent, 
> though obviously
> this presumes the implementation, so it would be more tied to the 
> conference server
> implementation.  I'm wondering if there would be a good way to say 
> "find a way to
> deliver a low and high resolution image", and let the system (rtcweb) 
> figure out how
> to do it given the shared codecs available.  (I.e. SVC in a particular 
> config if both
> the conf server and browser support it, simulcast streams if they don't.)
>> Second, why is audio mixed to stereo and not to something else, such as
>> 5.1?
> Remove reference to the number of channels.  That's handled via 
> negotiation as
> normal.
>> Third, the security stuff is not in any way technically bound to the 
>> rest
>> of the use case, so I would farm it out into its own use case, and/or
>> mention it as a "generic" feature... Remarks like "it is essential that
>> the communication cannot be eavesdropped" would apply to pretty much all
>> use cases, right?
> Per someone else's comment: we can drop this and add a separate 
> usecase for
> planning a bank robbery.  :-)  Better, though, might be a lawyer-client
> conversation or a secret agent. :-)

In regards to PIP, if A signs B's security certificate, and the media 
swaps, then there should exist A's security certificate signed by B. By 
yesterday's standard, that was not done automatically. The full duplex 
automation of such process is evidence as people's source of anxiety. 
Maybe that the generic feature for that, or to enable that process, is 
available on the "ribbon-end-point".

Can we make this toy-story stack more obvious? Stacked in this order, 
hypervisor(or grub), linux, X11, Xfce (GNOME/compiz optional), 
Mono/.NET, winelib/wine/winnt(co-op), Internet Browser, Win 7/8 File 
Manager. I think that is  clearer on exactly what is "the client" 
framework. Personnally, I want to "the client" to know schema-only, SSRC 
aware, with reflection built-into iNET, which I think is given (YMMV), 
yet not so obvious what is being accomplished with such stack overall. 
We can make wider use out of device driver certificates, under such 
stack, if we consider tablets (with "pen") as client-only.

Under RTP, I can imagine the pen certify itself with the "ink"'d media, 
and continue certification of any gesture recognition. Well, I have 
imagined this for dozen years now, yet realized my homework got done and 
turned-in... do municipal banks invest in student tablets? Yes! That is 
about what is left of municipalities, so now they like (or only can 
afford) the fire-n-forget "upgrade when you break it" mode loaded with 
academic licenses. Sound familiar?

I tried to get one grant written such that one local college had 
real-time DNS of local tablets and routers, so that homework and email 
could be sent around on campus with zero-config, but the procurement 
process stopped our non-faculty vote. Dummy me: "___ ate my pen."

You know why war-dialing started...  accessibilities and solutions clean 
of plagiarism.

>> 7. Missing use case:
>> It is my understanding that for regulatory compliance, in many developed
>> countries, there will be a need for an E911 type of service *IF* the
>> solution allows to "dial" an E.164 phone number.  I remember a 
>> controversy
>> involving SKYPE in ca. 2005, and also having read about recent FCC
>> hearings about this issue; for example,
>> on-extending-e911-rules-to-oneway-outboundonly-voip-improve-location-capabi 
>> lity-of-inteconnected-voip/.
>> If there is a reasonable expectation that a webrtc service with outbound
>> dialing capability in E.164 number-space requires E911 handling, then it
>> does not make sense to stick our collective heads in the sand and ignore
>> the issue.  I believe there is such an expectation; surely during the
>> lifetime of an webrtc solution, but probably even during its introducer
>> phase.
> Agreed.  I've dealt with FCC rulings (and I'm sure other countries 
> will have
> similar and possibly even stricter requirements).  Basically, it will 
> be very
> likely that a provider will want to connect rtcweb to the PSTN (even 
> if through
> a gateway), and once that's done they'll need to support E911 services.
> I could even see future expansion of the technologies for emergency 
> calling;
> you're seeing this now with emergency centers looking to support text 
> messages.
> So you might have emergency centers with rtcweb support directly or via a
> translation/forwarding service (as voip is generally done today).  
> There is
> significant advantage to an emergency center to be able to support 
> video calls,
> for example (though obviously there would be issues surrounding that).
>> If E911 is relevant in this sense, then this issue needs to be addressed
>> in section 4.3.1, 4.3.2, and perhaps 4.2.5.
>> I understand that the editors did not address those use cases yet 
>> based on
>> (presumably) lack of consensus, but I fear that IETF consensus is not 
>> the
>> only relevant factor here.
>> (I could mention "legal intercept" in the same context, but suggest to
>> focus on emergency calls first, because a) they are easier to handle, b)
>> more widely applicable, and c) generally agreed to be a useful thing, 
>> and
>> therefore not quite as politically loaded.)
> Yes.  :-)

--- ---
Web Development, Software Engineering
Ag-Biotech, Virtual Reality, Consultant