Re: [xrblock] ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01

Mario Montagud Climent <> Thu, 08 November 2012 19:30 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0FBA521F84D2 for <>; Thu, 8 Nov 2012 11:30:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.699
X-Spam-Status: No, score=-1.699 tagged_above=-999 required=5 tests=[AWL=-0.900, BAYES_00=-2.599, J_CHICKENPOX_25=0.6, J_CHICKENPOX_61=0.6, J_CHICKENPOX_71=0.6]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id E19EnprQTYZJ for <>; Thu, 8 Nov 2012 11:30:40 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 9DEAA21F85BA for <>; Thu, 8 Nov 2012 11:30:39 -0800 (PST)
Received: from ( []) by (8.13.6/8.13.6) with ESMTP id qA8JUV4W004286 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 8 Nov 2012 20:30:31 +0100
Received: from ( []) by (8.14.3/8.14.3) with ESMTP id qA8JUUSc008487; Thu, 8 Nov 2012 20:30:31 +0100
Received: from localhost ( []) by (8.13.6/8.13.6) with ESMTP id qA8JUT6A015592; Thu, 8 Nov 2012 20:30:29 +0100
Received: from ( []) by (Horde Framework) with HTTP; Thu, 08 Nov 2012 20:30:28 +0100
Message-ID: <>
Date: Thu, 08 Nov 2012 20:30:28 +0100
From: Mario Montagud Climent <>
To: Qin Wu <>
References: <> <> <>
In-Reply-To: <>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; DelSp="Yes"; format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
Cc: "" <>
Subject: Re: [xrblock] ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Metric Blocks for use with RTCP's Extended Report Framework working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 08 Nov 2012 19:30:42 -0000

Hi Qin, Rachel, all,

See our comments inline.

Qin Wu <> escribió:

> Hi,Mario and Fernando:
> Thank for your valuable reviews. Let me try to clarify your concerns.
> Also please see my reply below inline.
> Regards!
> -Qin
> -----????-----
> ???: Mario Montagud Climent []
> ????: 2012?11?4? 2:52
> ???:
> ??: Huangyihong (Rachel); Hitoshi Asaeda; Qin Wu
> ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
> Hi all,
> We (Fernando Boronat and me) have reviewed the updated version of the
> draft-ietf-xrblock-rtcp-xr-synchronization-01 and read the issues
> associated to this draft that raised recently in the mailing list.
> Here are our comments and suggestions:
> Comments regarding the 'Initial Synchronization Delay' Metric:
> - We still find a bit confusing the definition of this metric.
> We are dealing with INTER-STREAM Synchronization Delay, so we think
> the word INTER-STREAM should be added in the definition of this metric
> for better clarity.
> [Qin]:The Initial Synchronization delay we are using for this metric  
> is clearly
> Specified in RFC6051. The key feature is "Initial" rather than  
> "inter-stream".
> I don't believe it is referred to time different between two stream.  
> Rather than,
> It means how long it take to receive all the components of multimedia session
> Or layer session. But I agree inter-stream is applied to  
> synchronization offset metric
> We defined in this draft.
> As we understand from the RFC 6051, an appropriate definition could
> be: "In multimedia streaming services, the (inter-stream)
> synchronization delay refers to the time difference between the moment
> a user joins a (multicast) multimedia session, probably involving more
> than one media streams (e.g., audio and video, or when using layered
> and/or multi-description codecs), and the instant when the correlated
> media streams can be synchronously presented to that user, i.e. when
> RTCP packets (including SDES and SR reports), or when the first RTP
> packets with header extensions including in-band synchronization
> metadata, have been received on all the involved RTP sessions in the
> multimedia session".
> [Qin]: Looks good to me except the wording about "inter-stream".

[F & M] We didn't refer to the time difference (offset) between  
different streams, but to the time difference between joining a  
multimedia session (or a reference RTP session) and the instant at  
which all involved media streams can be initially presented to the  
users in a synchronous way. Therefore, we think this metric gives the  
INITIAL delay for INTER-STREAM synchronization, because all the  
involved media streams cannot be synchronously presented to the users  
until the info needed for synchronizing all of them (included in RTCP  
packets or in RTP header extensions, as specified in RFC 6051) has  
been received on all the component RTP sessions. This was our  
rationale for including the term INTER-STREAM in this definition.

> This draft applies the metric defined in RFC 6051 for receivers. As
> recently discussed in the mailing list, we also think that the method
> of measurement of this metric when a set of receivers are involved in
> a multimedia session (probably involving several RTP sessions) is not
> clear enough, mainly due to the use of "the instant when a member
> joins a media session" as the reference point.
> In such a case, the receiver may probably join a group of RTP sessions
> (multimedia session), so we think that the "Initial Sync Delay" in
> this draft should be defined using as a reference point a specific
> reference RTP session. Otherwise, is it expected that a receiver will
> join all the correlated RTP sessions almost simultaneously?
> [Qin]: Agree with you opinion, we can't assume the receiver joins
> Multiple session simultaneously.

[F&M] OK

> We understand that the minimization of this metric is important in
> multimedia streaming services, e.g. for minimizing zapping delays, but
> we would like to see in this draft the utility of this RFISD block.
> For example, should the receiver of this report (we assume the media
> source) do something when receiving it? Is this RFISD block only used
> for informational purposes?
> [Qin]: The information in this metric report can be used by the  
> receiver of this
> Report to compare actual initial synchronization delay to targets (i.e., a
>    numerical objective or Service Level Agreement) to help ensure the
>    quality of real-time application performance.

[F&M] We suggest adding a similar paragraph to the draft.

> Furthermore, is it expected that all the receivers join the multimedia
> session (or the group of RTP sessions) almost simultaneously? For
> instance, there could be significant delay differences between the
> instants at which different receivers join the same multimedia session
> (or the set of RTP sessions). In such a case, the measurement of the
> inter-stream synchronization delay should not have the same reference
> point for all of the receivers. Do we need some mechanisms to
> establish the same reference point or to indicate the exact instant
> for the reference point in each receiver?
> [Qin]: we allow different receivers report the different initial  
> synchronization delay.
> Since these receiver joins at different time. It doesn't matter  
> since What we report is per receiver metric.

[F & M] Ok. But, in this way, if different receivers, even under  
similar network conditions (e.g., delays, jitter?), join the session  
at different instants, but inside the same RTCP report interval (i.e.,  
between two consecutive RTCP packets sent by the media source for that  
session), there could be significant differences between the "Initial  
Sync Delays" reported by each one of them. For example, depending on  
the joining time, a receiver A could receive the RTCP packets from all  
the RTP sessions with larger/lower initial delay than another receiver  
B if the latter joined the session before/later than the former.

Even though you only want to "compare actual initial synchronization  
delay to targets (i.e., a numerical objective or Service Level  
Agreement)", as pointed out in your previous comment, could the above  
situation be problematic?

> We also think that the terms "start/beginning of session" in the text
> should be replaced or better explained.
> - We also think that using 1/65536 second units (giving 15 microsecond
> accuracy), instead of a 64-bit timestamp, should be accurate enough
> for Initial Sync Delay Reporting. But, wouldn?t be more practical and
> simpler the use of the same measurement units for both XR blocks?
> [Qin]: It looks good to me however what accuracy requirements for  
> initial sync Delay have we?
> Why the accuracy of the initial sync delay we currently define is  
> not sufficient?

[F & M] We think this accuracy is sufficient. Our comment was because  
of simplicity (this way, both reports could employ the same  
measurement unit).

> - "SSRC of Media Source" -> Shouldn?t this draft specify a policy for
> choosing the component SSRC of a multimedia session to report on this
> metric? The draft indicates an arbitrary stream, maybe an option
> should be the SSRC identifier of a multimedia session with the longest
> RTCP reporting interval ...
> [Qin]: no, we may report on each media stream that belongs to the  
> same multimedia session.

[F & M] Not sure we are following you here. Must one RFSI block be  
sent per each SSRC (of each one of the involved RTP sessions) in a  
multimedia session? Should not be easier to report only on one  
reference RTP session?

We see different options for selecting the reference RTP session for  
the "initial sync delay" (i.e. the reference point for this  

One option could be selecting the session with the longest RTCP  
reporting interval, as you proposed. Here, we think the "delay for  
RTCP reporting interval" concept must be defined accordingly to avoid  
a possible misleading. On one hand, we can see the delay for the RTCP  
reporting interval for a media session as the time difference between  
two successive RTCP packets from this media session. On the other  
hand, we can also see, from the receiver point of view, the delay for  
the RTCP reporting interval as the time difference between joining a  
media session and receiving the first RTCP packet from that media  
session. In the latter case, the delay for the RTCP reporting interval  
will depend on the specific joining time for that each receiver (each  
receiver can join a session in a different instant during the RTCP  
reporting interval). We propose to clarify this in the draft, if  

Another option could be "choosing the time when the first/ last RTP  
session is joined as the beginning of the multimedia session", as  
Rachel proposed.

We see as a more feasible option to choose the time when a receiver  
joins the FIRST RTP session of the multimedia session as the  
starting/reference point for the "initial sync delay" measurements.  
The use of the RTCP reporting delay for choosing a reference session  
could be problematic.

But, in the way we are proposing, we are assuming that the joining  
time for the first RTP session can be known for all the other RTP  
sessions involved in the multimedia session (if we do not want to  
assume this info can be accessible between the RTP sessions, we will  
have to assume that the joining time is almost the same for all the  
RTP sessions).

> Comments regarding the 'Synchronization Offset' Metric:
> - We also agree that reporting synchronization offset per report basis
> (instead that for packet basis) can be sufficient.
> - We think that the definition of this metric is clearer as it is now
> in the draft.
> We agree with the importance of minimizing the "sync offset" for
> guaranteeing QoE. As specified in RFC 3550, synchronization between
> two media streams, i.e. inter-stream synchronization, can be achieved
> by using the source identification (i.e. the CNAME item), included in
> the SDES reports, and the NTP-RTP timestamps correlation info,
> included in the SRs, from the different media streams.
> So, as for the other RFISD block, we would like to see the utility of
> this block in the draft. Should the receiver of the RFSO block (we
> assume the media source/s) do something (e.g. adaption mechanisms)
> when receiving it? Is this RFSO block only used for informational
> purposes? We think this should be clarified in the draft for both XR
> blocks.
> [Qin]: I think the information received from RFSo block is
>    valuable to network managers in troubleshooting network and user
>    experience issues.
> - The draft specifies a new XR block for reporting Synchronization
> Offset between correlated media streams. One of the streams is
> selected as the reference, so we think that some criteria for choosing
> that reference (i.e., master) media stream should be added in the
> draft. Otherwise, different receivers could select different streams
> as the reference one.
> Now, the dhe draft indicates (on page 6) that the reference stream
> "can be chosen as the arbitrary stream with minimum delay according to
> the common criterion defined in section of [Y.1540]".
> Using this mechanisms, different receivers could select different
> streams as the reference one. Could this be problematic?
> [Qin]:Good question, we may choose
> the SSRC identifier of one session in multimedia session with the longest
> RTCP reporting interval since RFSO deal with multiple sessions that  
> belong to the same multimedia session.

[F & M] Thanks. We are not sure about the suitability of this  
assumption. We do not see why the "stream with the longest RTCP  
reporting interval" should be selected as the reference stream for the  
"sync offset" measurement because of the above discussion. We think  
other mechanisms should be discussed. Possible options include the  
most lagged/advanced RTP media streams (i.e. the ones with the  
highest/lowest reception or presentation, i.e. end-to-end, delays) or  
a fixed reference stream selected based on other criteria.

Besides, the delay for the RTCP reporting interval does not have to be  
necessarily linked to the delay for the RTP media stream. A media  
session could have the longest RTCP reporting interval, but the delay  
for its RTP media stream could be acceptable.

For the "sync offset", we think it is better that the reference stream  
is selected based on the experienced delay for the involved RTP  
streams, because the sync offset is measured for the RTP streams, and  
the associated sync adjustments to minimize this offset are also  
performed on the RTP streams.

> Another question is: should each one of the receivers report on the
> "sync offset" for each one of the involved slave streams (in the
> multimedia or layered session) regarding the reference stream (i.e.,
> the one with offset zero)? Or, alternatively, is it sufficient on
> reporting the "sync offset" between the most lagged and most advanced
> streams? In such a case, which SSRC should be the reporting stream for
> the RFSO? We think this should be clarified in the draft.
> [Qin]: I think we should allow both. What we need to do is to choose  
> a reference stream
> Based on some criteria.
> - About the inclusion of signed or un-signed timestamps values for the
> "Sync Offset" metric, we see very reasonable the discussions raised
> recently in the mailing list.
> This draft does not specify definitive criteria on selecting the
> reference media stream. The document indicates (on page 6) that the
> reference stream "can be chosen as the arbitrary stream with minimum
> delay according to the common criterion defined in section of
> [Y.1540]".
> Maybe, if we do not want to include signed values or omit the
> "Lag/Lead" bit, the simplest solution could be to select the slowest
> (i.e. most lagged) or the fastest (i.e. more advanced) stream as the
> reference stream. This reference stream can be identified by the
> receiver of this report because its SSRC is included (SSRC of
> Reference) in the RFSO block. This way, the offset value will also
> have the same interpretation: the time difference between the
> reference (i.e. most lagged/advanced) and the reporting stream,
> identified by "SSRC of Source".
> Selecting the stream with the lowest delay as the reference could be
> an issue for live streams, since receivers cannot speed up a (slave)
> live stream to become synchronized.
> If we do not want this unique use case, i.e. if we want the reference
> stream can be an arbitrary stream, then we will need to decide on one
> of the proposals discussed in the mailing list for that, for example,
> the one currently included in the draft.
> [Qin]: Okay, I tend to agree with your first proposal.

[F&M] Thanks
> - And, finally, in our opinion, a very important issue is:
> This draft specifies the Synchronization Offset between correlated
> media streams taking into account the arrival times of RTP packets for
> the considered streams (see formula on page 8).
> Working on reception times could not be enough accurate for use cases
> with stringent inter-stream synchronization requirements, especially
> when different types of media streams are involved. This is because
> the different RTP streams could experience variable delays at the
> receiver side, i.e. from the reception instant of RTP packets until
> the instant at which the media units (e.g. video frames or audio
> samples) included in these RTP packets are played out, mainly due to
> different de-packetizing, de-payloading, de-coding, rendering,
> processing delays, etc.  So, if we want to report on accurate sync
> offset values, we should consider presentation times for the involved
> media streams, as in our IDMS draft. Do you think this requirement is
> also needed for this draft?
> [Qin]: Not sure about this, would you like to clarify how to use  
> presentation times to calculate sync offset?
> Another issue when measuring "sync offset" (per report basis)
> considering RTP arrival times is that this measurement can be
> significantly affected by the existence of network jitter. As the
> streams are sent independently, the RTP packets (or the specific RTP
> packet for which this metric is reported) of media stream A could
> experience low jitter delays, whilst the RTP packets (or the specific
> RTP packet for which this metric is reported) of media stream B could
> (sporadically) experience high jitter delays, so this would lead to
> the reporting of high and variable "sync offsets" values. So, this
> will not provide a smooth, but a variable, measurement.
> [Qin]: We have proposed to change RTP time stamp into NTP timestamps.

[F & M] In order to use presentation times, we would need to track RTP  
packets from their arrival to their presentation (or play out) times.  
This can be seen as a form of layer-violation in some RTP  
implementations, as previously discussed in the AVTCORE list for our  
IDMS draft. That is the reason why, in our IDMS draft, reporting on  
presentation times is optional, but reporting on arrival times is  

But, if presentation times are supported, the sync offset could be  
easily (and more accurately) calculated than in the current proposal.  
The calculation is as follows:

Different times for stream A: t_i_A (transmission time, i.e. RTP  
timestamp of i-th RTP packet of stream A), r_i_A (NTP-based arrival  
time of i-th RTP packet of stream A), p_i_A (NTP-based presentation  
time of i-th RTP packet of stream A)

Different times for stream B: t_j_B (transmission time, i.e. RTP  
timestamp of j-th RTP packet of stream B), r_j_B (NTP-based arrival  
time of j-th RTP packet of stream B), p_j_B (NTP-based presentation  
time of j-th RTP packet of stream B).

Therefore, the sync offset between stream A and B can be calculated as:

- Using presentation times: (p_i_A - t_i_A) - (p_j_B - t_j_B), this  
gives the end-to-end delay variability

- Using reception times: (r_i_A - t_i_A) - (r_j_B - t_j_B), this gives  
the network delay variability

* Note that we are assuming that RTP timestamps can be mapped to  
NTP-format timestamps (for the RTP transmission timestamps), based on  
the correlation timing info included in RTCP SRs.

Therefore, we think that using presentation timestamps the measurement  
of the "sync offset" metric will be more accurate (and smoother) than  
using reception times, because of the variable delays at the  
distribution and at the receiver sides.

> - For both XR blocks defined in this draft, it is stated that: "If the
> measurement is unavailable, the value of this field with all bits set
> to 1 SHOULD be reported". But, if the measurements are unavailable,
> why these XR blocks are needed? Would not it be better to simply not
> sending these XR blocks?
> [Qin]: these XR block may be sent in each RTCP report interval, if  
> we not send them to the receiver of XRBLOCK,
> the receiver will regard these XRBLOCK are lost, which is not what  
> we expected.
> - Finally, we think that this draft should indicate when the proposed
> XR blocks are sent. Should the RFISD block be sent only once per media
> session? Should the RFSO block be sent in each RTCP report interval in
> a compound RTCP packet?
> [Qin]: For RFISD, both allows, but I think it more makes sense to  
> send only once per media session.

[F & M] We assume that these RTCP report blocks will be sent in  
compound RTCP packets. Regarding the RFSI block, we assume that it is  
only needed once per session (since it reports on INITIAL Sync Delay,  
it is not needed during the session lifetime). Regarding the RFSO  
block, we think that two options could be employed: 1) to send this  
block in each RTCP report interval, independently of the value of the  
"sync offset"; 2) to send this block only if the value of the "sync  
offset" exceeds a configurable allowed asynchrony threshold.

Therefore, if these XR blocks are not included in the received RTCP  
compound packet, we think it can be assumed that the metrics these  
blocks report on are not available, without the need of sending the  
block reports (because in such a case, these reports do not contain  
any further statistics).

Best Regards,

Fernando & Mario.

> Proposed minor changes to the text:
> - Abstract: "and associated with SDP parameters" -> "and the
> associated SDP parameters".
> [Qin]: Okay.
> Should the SDP acronym be defined in the abstract?
> [Qin]: Okay.
> - Section 1.
> The second paragraph should be re-written for better clarity. In our
> opinion the terms "to establish multimedia session?, ?start of RTP
> sessions" or "acquires all components of RTP sessions" should be
> clarified all over the document.
> [Qin]: Okay.
> "with the same RTCP CNAME included in RTCP SDES packets" -> "with the
> same CNAME item included in RTCP SDES packets"
> [Qin]: Okay.
> Is the term "General" needed in the definition of the XR block: "RTP
> Flow General Synchronization Offset Report Block"? If so, shouldn?t be
> its BT identifier RFGSO, instead of RFSO? In some places of the text,
> it appears without the word "General".
> [Qin] Okay, will fix this in the later version.
> - Section 2. See our comments about the definition of the metrics.
> "or in different media types" -> "or OF different media types"
> "time difference of" -> "time difference BETWEEN"
> [Qin]: Okay.
> - Section 3.
> (line 3) "or the multimedia session" -> "or a multimedia session
> (probably involving more than one correlated RTP sessions)"
> "The component RTP session" -> "The component RTP sessions"
> "in multimedia session" -> "in a multimedia session"
> "is contributed to such initial synchronization playout" ->
> "contributes to such initial delay for synchronizing playout".
> "In the presence/absence of packet loss" -> "In the presence/absence
> of RTCP packet loss"
> (line 21) "needs to based" -> "needs to be based"
> "For example, one audio stream and one video stream belong to the same
> session and audio stream are transmitted lag behind video stream for
> multiple tens of milliseconds" -> "For example, one audio stream and
> one video stream belonging to the same session, and the audio stream
> is transmitted behind the video stream for multiple tens of
> milliseconds (i.e. the audio stream is lagged regarding the video
> stream)"
> "between video stream and audio stream" -> "between the video and the
> audio streams"
> [Qin]: Okay.
> - Section 4.
> "RTP Flows Initial Synchronization Delay Report Block" vs "RTP Flow
> Initial Synchronization Delay  Metrics Block" -> The same name should
> be used all over the document.
> [Qin]: Okay.
> - Section 5.
> (line 3) "is sent" -> "can be sent"
> "reports synchronization offset of the arbitrary two RTP streams" ->
> see our comments above.
> [Qin]: Okay.
> - Section 6.1. The SDP acronym is already defined before.
> - Section 7.
> "for RTCP XR" -> "for RTCP XR blocks"
> [Qin]: Okay. Many thanks.
> Hope it helps
> Best Regards,
> Fernando & Mario.
> "Huangyihong (Rachel)" <> escribió:
>> Hi folks,
>> We have just updated draft-ietf-xrblock-rtcp-xr-synchronization. In
>> this version, we addressed the issues raised recently in the mailing
>> list. If you have any different opinions, please let us know.
>> Best Regards!
>> Rachel
>> -----Original Message-----
>> From: []
>> Sent: Thursday, October 18, 2012 1:46 PM
>> To: Qin Wu
>> Cc: Huangyihong (Rachel);
>> Subject: New Version Notification for
>> draft-ietf-xrblock-rtcp-xr-synchronization-01.txt
>> A new version of I-D, draft-ietf-xrblock-rtcp-xr-synchronization-01.txt
>> has been successfully submitted by Qin Wu and posted to the
>> IETF repository.
>> Filename:	 draft-ietf-xrblock-rtcp-xr-synchronization
>> Revision:	 01
>> Title:		 RTP Control Protocol (RTCP) Extended Report (XR) Blocks for
>> Synchronization Delay and Offset Metrics Reporting
>> Creation date:	 2012-10-18
>> WG ID:		 xrblock
>> Number of pages: 11
>> URL:
>> Status:
>> Htmlized:
>> Diff:
>> Abstract:
>>    This document defines two RTP Control Protocol (RTCP) Extended Report
>>    (XR) Blocks and associated with SDP parameters that allow the
>>    reporting of synchronization delay and offset metrics for use in a
>>    range of RTP applications.
>> The IETF Secretariat
>> _______________________________________________
>> xrblock mailing list