Re: [rtcweb] [mmusic] WGLC of draft-ietf-rtcweb-use-cases-and-requirements-11

"Karl Stahl" <> Mon, 07 October 2013 20:16 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 1A1B521E81A1 for <>; Mon, 7 Oct 2013 13:16:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.174
X-Spam-Status: No, score=-2.174 tagged_above=-999 required=5 tests=[AWL=0.117, BAYES_20=-0.74, GB_I_INVITATION=-2, MSGID_MULTIPLE_AT=1.449, RCVD_IN_DNSWL_LOW=-1]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id OWktJ1qpFtac for <>; Mon, 7 Oct 2013 13:16:18 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id EBB3821E81BB for <>; Mon, 7 Oct 2013 13:16:10 -0700 (PDT)
Received: from ([]) by (Telecom3 SMTP service) with ASMTP id 201310072216030270; Mon, 07 Oct 2013 22:16:03 +0200
From: Karl Stahl <>
To: 'Dan Wing' <>
References: <> <> <> <07a601ceb64e$5caaba00$16002e00$> <07b001ceb65f$ce3f0cf0$6abd26d0$> <07e401ceb713$bef87a60$3ce96f20$> <>
In-Reply-To: <>
Date: Mon, 07 Oct 2013 22:16:04 +0200
Message-ID: <04c901cec39a$0d34b120$279e1360$@stahl>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac64hjS54N7NaLXjT8+x9ebvqsoFnwK6TyzQ
Content-Language: sv
Subject: Re: [rtcweb] [mmusic] WGLC of draft-ietf-rtcweb-use-cases-and-requirements-11
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 07 Oct 2013 20:16:24 -0000

Quality (below)... and it's NOT all about bandwidth

Dan Wing wrote: 
> Unfortunately the industry doesn't yet seem to agree there is even a

Well, that depends on which part of "the industry" we seem to be in:

There seems to be more concern about quality for 3.5 kHz POTS pre AM-radio
quality voice, than for HiFi and HD video WebRTC with telepresence
capability... I think forward carriers are concerned.

And when putting the media over our data crowded Internet accesses (that is
the way it is/has to be with WebRTC, Skype etc.) there is certainly
competition for the bandwidth. TCP type of traffic is always
(intermittently) filling the pipe at the most narrow point, packets are lost
(that is the way TCP flows share the bandwidth and make the Internet work),
but both TCP and UDP packets carrying media are lost. Level 3 (or lower) QoS
mechanisms diffserve, RSVP, prioritization and traffic shaping at congestion
points (most often the CPE/Firewall/access router) helps/works - if used!

- Carriers deploying POTSoIP with voice ending up in RJ11 ports on a CPE
(often part of a triple play service, delivering also data and IP-TV) always
have some sort of level 3 or below QoS, giving them loss-less voice media
transport. E.g. my ADSL+2 CPE classifies RJ11 voice, puts it on prioritized
level 2 ATM (57 byte packets!) until the DSLAM headend puts it together to
prioritized VLAN Ethernet frames and some switch puts it onto an MPLS link,
that in this case actually carries IP-packets with routable addresses. This
call may go to a 100 Mbps triple play Ethernet service where the voice comes
over a VLAN tagged Ethernet, not competing with data that is on another VLAN
(but often broken by some TDM peering point if more carriers are

- Today, with higher access speeds, there is a carrier trend to use IP-level
diffserve QoS in the access (instead of level 2-2.5 separate networks), and
cable networks use RSVP for media that are shared on their Internet pipes.
That is sufficient and gives loss-less, low delay media transport.

QoS mechanisms at data crowded Internet accesses are required, even for
decent POTS voice. I have a good 2 Mbps upstream ADSL2+ Annex M access.
Using VoIP on my LAN without any QoS mechanism - 100 kbps voice is not even
possible for the remote end to understand, when two kids are surfing and
file sharing resulting in some 2000 open TCP flows on the same access.
Adding prioritization and traffic shaping in the ADSL modem/firewall
(built-in E-SBC which SIP proxy keeps track of SIP media and classifies it),
the voice get as perfect as when I had the pipe to myself for my call only.

When SIP Trunking PBXs (i.e. connecting them to ITSPs IP telephony access,
as we do with our Ingate and Intertex E-SBCs), there is often a separate
non-Internet pipe (maybe over MPLS) or an Internet access only used for the
SIP Trunk. But when we share an Internet pipe with TCP flow intensive data,
we certainly apply QoS (prioritization and traffic shaping (staying below
the access bandwidth by holding back data traffic). That is a must for
decent sound quality - even when you have plenty of bandwidth.

Now with WebRTC, there is no signaling protocol to help us see what is going
on, diffserve bits cannot be set in common OS (and maybe never when muxing
different flows over a single UDP port), signaling and media is encrypted
and ICE is used for traversing NAT/firewalls without them having a clue what
is going on (which was the purpose with ICE...). How can we then classify
the real-time traffic and apply any level 3 QoS methods that can do miracles
in this harsh environment?

That is why I brought up the idea of enforcing TURN at the access and
classifying or routing UDP flows opened by ICE.

Even if favoring UDP over TCP, using tolerant codecs, FEC, and other
endpoint smart trix helps a lot, there is a limit to what can be achieved if
you have packet loss and delay. Those are the things handled with level 3
(and below) QoS. WebRTC requires better transport than POTS voice, and
bandwidth is not everything. The network must be given a chance to
prioritize the traffic that needs it.

(Just to clarify: I don't want us to use separate networks - Let's stay on
our Internet accesses!)   


PS: Just browsed your links briefly and noted "(PCP [RFC6887]).
...surprising ...does not require the network operate a NAT..." The same
goes for TURN also, doesn't it?

-----Ursprungligt meddelande-----
Från: Dan Wing [] 
Skickat: den 23 september 2013 19:57
Till: Karl Stahl
Ämne: Re: [rtcweb] [mmusic] WGLC of

On Sep 21, 2013, at 2:44 PM, Karl Stahl <> wrote:

> Yet another thing related to
> draft-ietf-rtcweb-use-cases-and-requirements-11:
> It is about payload type, PT=, in SDP and RTP, so I am copying MMUSIC
> Network service providers have expressed an interest to know whether 
> packets carry audio or video, to be able to handle them differently in 
> the network (e.g. quality wise). PT is visible outside the encrypted 
> payload in RTP, however if dynamic payload types PT:96-127 are used, 
> you cannot know what the payload is without knowledge of the SDP 
> (which we for WebRTC must assume the network provider has no knowledge
> In I 
> see no PTs defined for Opus, VP8, H.264 etc. considered for WebRTC.
> So, can we have payload types assigned to codecs that will be 
> recommended for WebRTC (PT:35-71 are unassigned)?
> Or can we at least split dynamic payload types PT:96-127 into groups 
> for audio and video codecs?
> I relation to that simple request, one may wonder how the network 
> anyway can know what is carried in an UPD packet (the RTP header is no 
> reserved field - it could be the payload of something else).
> Quality related requirements F38, A23 and A26 in the use case draft, 
> nowadays only seem to relate to the browsers, not assuming that 
> diffserve bits or similar are conveyed to the network. That is 
> realistic, since most operating systems don't allow quality markings
(diffserve, TOS) of packets.
> However, 3.2.1.  Simple Video Communication Service, mentions "The web 
> service monitors the quality of the service (focus on quality of audio 
> and
> video) the end-users experience.". I don't understand how "The *web 
> service* monitors" based on the listed requirements. Should it be "The 
> *browsers* monitors"?
> What are then the possibilities for a network to classify traffic for 
> quality or other purposes?

I agree there is a problem here, and have been trying to convince others
there is a problem that needs to be solved.  So far, there appears to be
scant agreement there is a problem.  See thread on TSVWG starting at  The
solution I am pitching is an extension to PCP, draft-wing-pcp-flowdata.
Another solution is MALICE which adds information to the ICE connectivity
checks (bandwidth, drop preference, etc.) which can be DPI'd by network

But before getting deep on solutions, we first we need some consensus that
some flows need different handling than other flows, and then acquiescence
that existing techniques do not solve the problem (RSVP, NSIS, Diffserv).
Unfortunately the industry doesn't yet seem to agree there is even a


> 3G/4G networks have DPIs (Deep Packet Inspection) - such box may guess 
> what encrypted RTP traffic is... or may not...
> Real time communication protocols using ICE as a pre-protocol to 
> establish media paths give a possibility though. If the network 
> provider offers a TURN server at his access, and enforces the TURN 
> server to be used (by eating STUN packets), then the RTP flows set up 
> through the TURN server could classify and mark packets. Then it is 
> useful to know whether it is an audio or video packet by looking at the
payload type.
> A LAN firewall, can also include a TURN-server, that in addition to 
> classifying and marking packets for the transport network (if honored 
> - which rarely is the case on the Internet today), it can also 
> prioritize and traffic shape so the RTP traffic at least is 
> undisturbed through a data crowded Internet access.
> A more difficult request comes from networks using bandwidth 
> reservation
> (RSVP) for quality, like mobile and cable networks. Such networks 
> would benefit from knowing the bandwidth used, but that is not fixed 
> for advanced codecs like the ones considered for WebRTC. One way would 
> be to reserve maximum bandwidth, and possibly repeat the reservation 
> if less is actually used.
> /Karl
> -----Ursprungligt meddelande-----
> Från: [] För 
> Karl Stahl
> Skickat: den 21 september 2013 02:16
> Till:;
> Ämne: [rtcweb] WGLC of draft-ietf-rtcweb-use-cases-and-requirements-11
> While reading the draft-ietf-rtcweb-use-cases-and-requirements-11, 
> here are a few "telephony related" WebRTC things I think should be 
> clarified in the use cases.
> 3.2.1.  Simple Video Communication Service  Description ...  
> The invited user might accept or reject the session. 
> [Suggest adding] The invited user might accept only audio, rejecting 
> video (even if a camera is enabled). A user may also select to 
> initiate an audio session, without video.
> And in API requirements:
>   ----------------------------------------------------------------
>   A1      The Web API must provide means for the application to ask the
> browser for permission to use cameras and microphones, individually as 
> input devices. (One must be able to answer with voice only - declining
>   ----------------------------------------------------------------
> Same under
> 6.2.  Browser Considerations
> ...
> The browser is expected to provide mechanisms for users to revise and 
> even completely revoke consent to use device resources such as camera 
> and microphone. [Suggest adding] Specifically, a user must be given 
> the opportunity to only accept audio in a video call invitation.
> 3.2.12.  Multiparty video communication  Description ...
> [Suggest adding] It is essential that automatic adjustments of 
> microphone volume is disabled, or microphones not spoken into are 
> muted. (This is a serious problem with most soft clients (SIP clients) 
> of today, plaguing conferences with ever increasing noise from silent 
> participants.)
> And in API requirements:
>   ----------------------------------------------------------------
>   A15     The Web API must provide means for the web application to adjust
> the level in audio streams.
>  ----------------------------------------------------------------
>   Axx     The Web API must provide means to disable any automatic volume
> adjustment in the sent audio streams. (To avoid disturbing noise in 
> conferences - making many softclients unusable).
>   ----------------------------------------------------------------
> 3.2.6.  Simple Video Communication Service, access change
> Description ...
> the user has to start a trip during the session. The communication 
> device automatically changes to use WiFi when the Ethernet cable is 
> removed and then moves to cellular access to the Internet when moving 
> out of WiFi coverage.  The session continues even though the access method
> [Question] Is this some sort of roaming without network support 
> (please clarify)? Getting a new access will also give the client a new 
> IP address, won't it? How could then the session continue? The 
> browsers will have no signaling connection and cannot renegotiate a media
connection, can they?
> _______________________________________________
> rtcweb mailing list
> _______________________________________________
> rtcweb mailing list