Re: [rtcweb] [tsvwg] Diffserv QoS for Video

<> Tue, 10 May 2016 08:22 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 5820D12B074; Tue, 10 May 2016 01:22:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -5.216
X-Spam-Status: No, score=-5.216 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.996] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id TDbu5l6CQoSf; Tue, 10 May 2016 01:22:00 -0700 (PDT)
Received: from ( []) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 771B6128874; Tue, 10 May 2016 01:21:58 -0700 (PDT)
Received: from ([]) by with ESMTP/TLS/DHE-RSA-AES128-SHA; 10 May 2016 10:18:36 +0200
X-IronPort-AV: E=Sophos;i="5.24,604,1454972400"; d="scan'208";a="1059181812"
Received: from ([]) by with ESMTP/TLS/AES128-SHA; 10 May 2016 10:18:31 +0200
Received: from ([]) by ([::1]) with mapi; Tue, 10 May 2016 10:18:31 +0200
Date: Tue, 10 May 2016 10:18:30 +0200
Thread-Topic: [tsvwg] Diffserv QoS for Video
Thread-Index: AdGqjqcAtUfKmRHmTSOs4NfsI9nIYQAARuzg
Message-ID: <>
References: <> <em88678e54-c513-4d74-8bbd-ba0785d70b36@sydney> <> <>
In-Reply-To: <>
Accept-Language: en-US, de-DE
Content-Language: de-DE
acceptlanguage: en-US, de-DE
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <>
Subject: Re: [rtcweb] [tsvwg] Diffserv QoS for Video
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 10 May 2016 08:22:03 -0000

Hi Harald,

We should avoid to re-open last call. That's not my intent.

I'm working on transport only and my rtcweb input is draft-ietf-tsvwg-rtcweb-qos-15. It states little for audience like me, if the subject is audio and video related to a talking person. I think the relevant statement are lines in a table. I'd appreciate text helping with the interpretation of table. If we can't agree on useful text - leave it as is.

A backbone supporting RTP based TV distribution tends to be engineered for support of an AF4 like QoS class for low loss. I think, telephony codecs, whose RTP streams will be transported by EF, are able to deal with higher packet loss, than multicast based IPTV. 

To me, an audio or telepresence/videoconference call with QoS require admission control. This ensures that they are transported without queuing or drops.

If there's congestion however:

An audio flow marked AF41 should face a lower drop rate than a video flow marked AF42 in the case of congestion at an network edge node. The packets arrive in the same order as they were sent (independent of the flow they belong to).

An EF marked audio flow may experience loss events independent from a video flow marked AF4x. In the case of queuing, one of the flows will be earlier. If both are transported by QoS classes optimized for rtp/udp transport, the difference is a few [ms] per congestion point. If one of the queues is optimized for general data transport, the delay difference is likely to be a double digit [ms] number. The packets of each flow arrive in order as sent, the packets of one flow are delayed against those of the other.



-----Ursprüngliche Nachricht-----
Von: Harald Alvestrand [] 
Gesendet: Dienstag, 10. Mai 2016 09:36
An: Geib, Rüdiger;
Betreff: Re: [tsvwg] Diffserv QoS for Video

FTR: I don't see such an agreement at all.

On the contrary, my perception is that people want the ability to deliver audio with a lower loss probability and lower delay probability than video - it's more important to the conversation, and there are fewer things the recipient can do to hide the losses. If the sender chose to send them on separate flows, they shold have different DSCP markings.

I believe this is what draft-ietf-tsvwg-rtcweb-qos-15 section 5 states, and I believe that this is what TSVWG has declared consensus on and wrote in the document that passed WG last call and is currently in "waiting for writeup" state.

Changing this determination would, at minimum, require reopening the WG Last Call.
And I'd object.


Den 10. mai 2016 08:56, skrev
> Hi Paul,
> I think we agree, that audio and video frames, if both are part of the 
> same (interactive) media flow should be transported by the same PHB 
> [PJ] or the same queue [RG]. The latter is ensured, if the same PHB is 
> picked for audio and video. To me the text of the draft so far doesn't 
> express that both audio and video are supposed to use an "Interactive 
> Video..." PHB, if both are present. I'd prefer to have text with a non 
> binding standard requirement saying
>      However, if the application wishes to send both interactive 
>      video and audio, it is RECOMMENDED to transport audio 
>      and video packets by the same per hop behavior. For example, 
>      audio and video packets would both be marked as AF42 or
>      AF43. 
> I don't insist on descriptive text proposing to transport audio by an AF4 PHB offering a lower drop ratio than that used to transport video. My audio/video experts support this and I'm pretty sure, that also Cisco representatives mentioned that audio quality ranks above video quality in telepresence sessions.
> Regards,
> Ruediger
> -----Ursprüngliche Nachricht-----
> Von: Paul E. Jones []
> Gesendet: Montag, 9. Mai 2016 21:03
> An: Geib, Rüdiger
> Cc:;;; 
> Betreff: Re: AW: [tsvwg] Diffserv QoS for Video
> Ruediger,
> Perhaps an example might be helpful.  How about I add this text for illustrative purposes?
>      To illustrate the use of the above table, let us assume the
>      application assigns a priority of "medium" to audio and video
>      flows.  Given that assumption, if the application wishes to send
>      only audio then packets would be marked EF.  However, if the
>      application wishes to send both interactive video and audio,
>      then audio and video packets would both be marked as AF42 or
>      AF43.  The intent is to ensure that when both audio and video
>      are being sent together that they receive similar per-hop
>      behavior.
> This doesn't get into the preference for AF42 vs. AF43. If it were me, I'd mark all audio as AF42 and only key video frames as AF42.  All predictive frames would be sent with an AF43 marking.  I might even take it a step further and classify all audio as "high".  However, I've seen a tremendous amount of debate on this before, so I'd prefer to not go too far in dictating audio markings vs. video.  I do think most people generally agree about at least ensuring the class is the same, otherwise the wildly different PHB introduces skew between A/V packet arrival, thus inflating the size of buffers managing the A/V streams.  However, we do not want to dictate that audio should be treated significantly better than audio.  For deaf users, for example, the audio really isn't important at all.  That is perhaps an extreme example, but it nonetheless highlights why we should be cautious about exactly what we normatively mandate.
> Paul
> ------ Original Message ------
> From:
> To:
> Cc:;;; 
> Sent: 5/9/2016 3:34:25 AM
> Subject: AW: [tsvwg] Diffserv QoS for Video
>> Hi Paul,
>> I've talked with audio/video experts of Deutsche Telekom and they too 
>> favored what you recommend below: transport audio and video by the 
>> same queue. Your statement below however stops there and the draft 
>> text doesn't clarify the issue:
>> If there's interactive video with audio, then they both should be 
>> marked for the same PHB which is:
>> - EF ?
>> - AF4? Like AF41 Audio, AF42 Video (AF43 in addition, if P or B 
>> frames are to receive a lower priority /
>>   higher drop precedence PHB)?
>> I personally prefer AF4 if audio and video are to be transported in 
>> the same queue.
>> I'd also ask for the draft text to be clear about the issue when to 
>> mark audio by the EF PHB. My understanding after reading your 
>> statement below is: Audio marked EF if there's no video flow only.
>> ...
>> BC>Finally, why is audio not also subdivided into interactive and 
>> BC>non-interactive? As far as I can see, both are logically possible.
>> [PJ] For WebRTC, audio alone is "interactive" in nature (which is
>>   why it's marked EF).  However, if one is sending audio and video
>>   it makes sense to mark them both same way to get the same PHB and
>>   hopefully have them queued in the same buffers along the path.
>>   Sending audio as EF and possibly having a PHB that results in
>>   packets arriving much faster than corresponding video packets
>>   marked as AF42 is not at all helpful for applications that have
>>   to synchronize the audio and video flows.
>> ...
>>   Paul
>> Regards,
>> Ruediger