Re: [xrblock] ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01

Qin Wu <> Thu, 15 November 2012 09:03 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id CAC0821F85CD for <>; Thu, 15 Nov 2012 01:03:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -3.486
X-Spam-Status: No, score=-3.486 tagged_above=-999 required=5 tests=[AWL=-0.440, BAYES_00=-2.599, J_CHICKENPOX_25=0.6, J_CHICKENPOX_61=0.6, J_CHICKENPOX_71=0.6, MIME_BASE64_TEXT=1.753, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id iUSAB4wPhSXE for <>; Thu, 15 Nov 2012 01:03:25 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id E644E21F8557 for <>; Thu, 15 Nov 2012 01:03:20 -0800 (PST)
Received: from (EHLO ([]) by (MOS 4.3.5-GA FastPath queued) with ESMTP id AMV09360; Thu, 15 Nov 2012 09:03:19 +0000 (GMT)
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.1.323.3; Thu, 15 Nov 2012 09:03:04 +0000
Received: from ( by ( with Microsoft SMTP Server (TLS) id 14.1.323.3; Thu, 15 Nov 2012 09:03:16 +0000
Received: from w53375 ( by ( with Microsoft SMTP Server (TLS) id 14.1.323.3; Thu, 15 Nov 2012 17:03:08 +0800
Message-ID: <>
From: Qin Wu <>
To: Mario Montagud Climent <>
References: <> <> <> <>
Date: Thu, 15 Nov 2012 17:03:08 +0800
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: base64
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
X-Originating-IP: []
X-CFilter-Loop: Reflected
Subject: Re: [xrblock] ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Metric Blocks for use with RTCP's Extended Report Framework working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 15 Nov 2012 09:03:26 -0000

Thank for your long length comments.:-) I remove the comments we have no issues. 
Please see my reply inline below.

----- Original Message ----- 
From: "Mario Montagud Climent" <>
To: "Qin Wu" <>
Cc: <>; "Huangyihong (Rachel)" <>; "Hitoshi Asaeda" <>
Sent: Friday, November 09, 2012 3:30 AM
Subject: Re: ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01

> Hi Qin, Rachel, all,
> See our comments inline.
> Qin Wu <> escribió:
>> Hi,Mario and Fernando:
>> Thank for your valuable reviews. Let me try to clarify your concerns.
>> Also please see my reply below inline.
>> Regards!
>> -Qin
>> -----????-----
>> ???: Mario Montagud Climent []
>> ????: 2012?11?4? 2:52
>> ???:
>> ??: Huangyihong (Rachel); Hitoshi Asaeda; Qin Wu
>> ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
>> Hi all,
>> We (Fernando Boronat and me) have reviewed the updated version of the
>> draft-ietf-xrblock-rtcp-xr-synchronization-01 and read the issues
>> associated to this draft that raised recently in the mailing list.
>> Here are our comments and suggestions:
>> Comments regarding the 'Initial Synchronization Delay' Metric:
>> - We still find a bit confusing the definition of this metric.
>> We are dealing with INTER-STREAM Synchronization Delay, so we think
>> the word INTER-STREAM should be added in the definition of this metric
>> for better clarity.
>> [Qin]:The Initial Synchronization delay we are using for this metric  
>> is clearly
>> Specified in RFC6051. The key feature is "Initial" rather than  
>> "inter-stream".
>> I don't believe it is referred to time different between two stream.  
>> Rather than,
>> It means how long it take to receive all the components of multimedia session
>> Or layer session. But I agree inter-stream is applied to  
>> synchronization offset metric
>> We defined in this draft.
>> As we understand from the RFC 6051, an appropriate definition could
>> be: "In multimedia streaming services, the (inter-stream)
>> synchronization delay refers to the time difference between the moment
>> a user joins a (multicast) multimedia session, probably involving more
>> than one media streams (e.g., audio and video, or when using layered
>> and/or multi-description codecs), and the instant when the correlated
>> media streams can be synchronously presented to that user, i.e. when
>> RTCP packets (including SDES and SR reports), or when the first RTP
>> packets with header extensions including in-band synchronization
>> metadata, have been received on all the involved RTP sessions in the
>> multimedia session".
>> [Qin]: Looks good to me except the wording about "inter-stream".
> [F & M] We didn't refer to the time difference (offset) between  
> different streams, but to the time difference between joining a  
> multimedia session (or a reference RTP session) and the instant at  
> which all involved media streams can be initially presented to the  
> users in a synchronous way. Therefore, we think this metric gives the  
> INITIAL delay for INTER-STREAM synchronization, because all the  
> involved media streams cannot be synchronously presented to the users  
> until the info needed for synchronizing all of them (included in RTCP  
> packets or in RTP header extensions, as specified in RFC 6051) has  
> been received on all the component RTP sessions. This was our  
> rationale for including the term INTER-STREAM in this definition.

[Qin]: my understanding is the initial synchronization delay we care about is total 
time difference between the first stream is joined and all the streams are syncronized. 
We don't care about the initial synchronization delay between any either two streams
 in the multimedia session.

So add "inter-stream" may confuse people we are measuring time difference or synchronization
offset between two streams.

>> We understand that the minimization of this metric is important in
>> multimedia streaming services, e.g. for minimizing zapping delays, but
>> we would like to see in this draft the utility of this RFISD block.
>> For example, should the receiver of this report (we assume the media
>> source) do something when receiving it? Is this RFISD block only used
>> for informational purposes?
>> [Qin]: The information in this metric report can be used by the  
>> receiver of this
>> Report to compare actual initial synchronization delay to targets (i.e., a
>>    numerical objective or Service Level Agreement) to help ensure the
>>    quality of real-time application performance.
> [F&M] We suggest adding a similar paragraph to the draft.

[Qin]: Okay.

>> Furthermore, is it expected that all the receivers join the multimedia
>> session (or the group of RTP sessions) almost simultaneously? For
>> instance, there could be significant delay differences between the
>> instants at which different receivers join the same multimedia session
>> (or the set of RTP sessions). In such a case, the measurement of the
>> inter-stream synchronization delay should not have the same reference
>> point for all of the receivers. Do we need some mechanisms to
>> establish the same reference point or to indicate the exact instant
>> for the reference point in each receiver?
>> [Qin]: we allow different receivers report the different initial  
>> synchronization delay.
>> Since these receiver joins at different time. It doesn't matter  
>> since What we report is per receiver metric.
> [F & M] Ok. But, in this way, if different receivers, even under  
> similar network conditions (e.g., delays, jitter?), join the session  
> at different instants, but inside the same RTCP report interval (i.e.,  
> between two consecutive RTCP packets sent by the media source for that  
> session), there could be significant differences between the "Initial  
> Sync Delays" reported by each one of them. For example, depending on  
> the joining time, a receiver A could receive the RTCP packets from all  
> the RTP sessions with larger/lower initial delay than another receiver  
> B if the latter joined the session before/later than the former.
> Even though you only want to "compare actual initial synchronization  
> delay to targets (i.e., a numerical objective or Service Level  
> Agreement)", as pointed out in your previous comment, could the above  
> situation be problematic?
>> We also think that the terms "start/beginning of session" in the text
>> should be replaced or better explained.
>> - We also think that using 1/65536 second units (giving 15 microsecond
>> accuracy), instead of a 64-bit timestamp, should be accurate enough
>> for Initial Sync Delay Reporting. But, wouldn?t be more practical and
>> simpler the use of the same measurement units for both XR blocks?
>> [Qin]: It looks good to me however what accuracy requirements for  
>> initial sync Delay have we?
>> Why the accuracy of the initial sync delay we currently define is  
>> not sufficient?
> [F & M] We think this accuracy is sufficient. Our comment was because  
> of simplicity (this way, both reports could employ the same  
> measurement unit).

[Qin]: Looks good to me. Let's see what other people think.

>> - "SSRC of Media Source" -> Shouldn?t this draft specify a policy for
>> choosing the component SSRC of a multimedia session to report on this
>> metric? The draft indicates an arbitrary stream, maybe an option
>> should be the SSRC identifier of a multimedia session with the longest
>> RTCP reporting interval ...
>> [Qin]: no, we may report on each media stream that belongs to the  
>> same multimedia session.
> [F & M] Not sure we are following you here. 

>Must one RFSI block be  
> sent per each SSRC (of each one of the involved RTP sessions) in a  
> multimedia session? Should not be easier to report only on one  
> reference RTP session?

[Qin]: Sorry, I thought you comment on synchronization offset. I take back what I said here.
I think you are correct. It is more reasonable to report only on one reference RTP session.
> We see different options for selecting the reference RTP session for  
> the "initial sync delay" (i.e. the reference point for this  
> measurement).
> One option could be selecting the session with the longest RTCP  
> reporting interval, as you proposed. Here, we think the "delay for  
> RTCP reporting interval" concept must be defined accordingly to avoid  
> a possible misleading. On one hand, we can see the delay for the RTCP  
> reporting interval for a media session as the time difference between  
> two successive RTCP packets from this media session. On the other  
> hand, we can also see, from the receiver point of view, the delay for  
> the RTCP reporting interval as the time difference between joining a  
> media session and receiving the first RTCP packet from that media  
> session. In the latter case, the delay for the RTCP reporting interval  
> will depend on the specific joining time for that each receiver (each  
> receiver can join a session in a different instant during the RTCP  
> reporting interval). We propose to clarify this in the draft, if  
> adopted.
> Another option could be "choosing the time when the first/ last RTP  
> session is joined as the beginning of the multimedia session", as  
> Rachel proposed.
> We see as a more feasible option to choose the time when a receiver  
> joins the FIRST RTP session of the multimedia session as the  
> starting/reference point for the "initial sync delay" measurements.  
> The use of the RTCP reporting delay for choosing a reference session  
> could be problematic.
> But, in the way we are proposing, we are assuming that the joining  
> time for the first RTP session can be known for all the other RTP  
> sessions involved in the multimedia session (if we do not want to  
> assume this info can be accessible between the RTP sessions, we will  
> have to assume that the joining time is almost the same for all the  
> RTP sessions).
[Qin]: Good points, In the meeting, we did discuss this issue, we think choosing RTP session
with longest interval is not reasonable since report interval may change with session size.
We believe choosing the time when the first RTP session is joined is a good choice
for the measurement starting point.

>> Comments regarding the 'Synchronization Offset' Metric:
>> - We also agree that reporting synchronization offset per report basis
>> (instead that for packet basis) can be sufficient.
>> - We think that the definition of this metric is clearer as it is now
>> in the draft.
>> We agree with the importance of minimizing the "sync offset" for
>> guaranteeing QoE. As specified in RFC 3550, synchronization between
>> two media streams, i.e. inter-stream synchronization, can be achieved
>> by using the source identification (i.e. the CNAME item), included in
>> the SDES reports, and the NTP-RTP timestamps correlation info,
>> included in the SRs, from the different media streams.
>> So, as for the other RFISD block, we would like to see the utility of
>> this block in the draft. Should the receiver of the RFSO block (we
>> assume the media source/s) do something (e.g. adaption mechanisms)
>> when receiving it? Is this RFSO block only used for informational
>> purposes? We think this should be clarified in the draft for both XR
>> blocks.
>> [Qin]: I think the information received from RFSo block is
>>    valuable to network managers in troubleshooting network and user
>>    experience issues.
>> - The draft specifies a new XR block for reporting Synchronization
>> Offset between correlated media streams. One of the streams is
>> selected as the reference, so we think that some criteria for choosing
>> that reference (i.e., master) media stream should be added in the
>> draft. Otherwise, different receivers could select different streams
>> as the reference one.
>> Now, the dhe draft indicates (on page 6) that the reference stream
>> "can be chosen as the arbitrary stream with minimum delay according to
>> the common criterion defined in section of [Y.1540]".
>> Using this mechanisms, different receivers could select different
>> streams as the reference one. Could this be problematic?
>> [Qin]:Good question, we may choose
>> the SSRC identifier of one session in multimedia session with the longest
>> RTCP reporting interval since RFSO deal with multiple sessions that  
>> belong to the same multimedia session.
> [F & M] Thanks. We are not sure about the suitability of this  
> assumption. We do not see why the "stream with the longest RTCP  
> reporting interval" should be selected as the reference stream for the  
> "sync offset" measurement because of the above discussion. We think  
> other mechanisms should be discussed. Possible options include the  
> most lagged/advanced RTP media streams (i.e. the ones with the  
> highest/lowest reception or presentation, i.e. end-to-end, delays) or  
> a fixed reference stream selected based on other criteria.

[Qin]: Looks better than my proposal. :-)
> Besides, the delay for the RTCP reporting interval does not have to be  
> necessarily linked to the delay for the RTP media stream. A media  
> session could have the longest RTCP reporting interval, but the delay  
> for its RTP media stream could be acceptable.

> For the "sync offset", we think it is better that the reference stream  
> is selected based on the experienced delay for the involved RTP  
> streams, because the sync offset is measured for the RTP streams, and  
> the associated sync adjustments to minimize this offset are also  
> performed on the RTP streams.

[Qin]: Agree.

>> - And, finally, in our opinion, a very important issue is:
>> This draft specifies the Synchronization Offset between correlated
>> media streams taking into account the arrival times of RTP packets for
>> the considered streams (see formula on page 8).
>> Working on reception times could not be enough accurate for use cases
>> with stringent inter-stream synchronization requirements, especially
>> when different types of media streams are involved. This is because
>> the different RTP streams could experience variable delays at the
>> receiver side, i.e. from the reception instant of RTP packets until
>> the instant at which the media units (e.g. video frames or audio
>> samples) included in these RTP packets are played out, mainly due to
>> different de-packetizing, de-payloading, de-coding, rendering,
>> processing delays, etc.  So, if we want to report on accurate sync
>> offset values, we should consider presentation times for the involved
>> media streams, as in our IDMS draft. Do you think this requirement is
>> also needed for this draft?
>> [Qin]: Not sure about this, would you like to clarify how to use  
>> presentation times to calculate sync offset?
>> Another issue when measuring "sync offset" (per report basis)
>> considering RTP arrival times is that this measurement can be
>> significantly affected by the existence of network jitter. As the
>> streams are sent independently, the RTP packets (or the specific RTP
>> packet for which this metric is reported) of media stream A could
>> experience low jitter delays, whilst the RTP packets (or the specific
>> RTP packet for which this metric is reported) of media stream B could
>> (sporadically) experience high jitter delays, so this would lead to
>> the reporting of high and variable "sync offsets" values. So, this
>> will not provide a smooth, but a variable, measurement.
>> [Qin]: We have proposed to change RTP time stamp into NTP timestamps.
> [F & M] In order to use presentation times, we would need to track RTP  
> packets from their arrival to their presentation (or play out) times.  
> This can be seen as a form of layer-violation in some RTP  
> implementations, as previously discussed in the AVTCORE list for our  
> IDMS draft. That is the reason why, in our IDMS draft, reporting on  
> presentation times is optional, but reporting on arrival times is  
> mandatory.
> But, if presentation times are supported, the sync offset could be  
> easily (and more accurately) calculated than in the current proposal.  
> The calculation is as follows:
> Different times for stream A: t_i_A (transmission time, i.e. RTP  
> timestamp of i-th RTP packet of stream A), r_i_A (NTP-based arrival  
> time of i-th RTP packet of stream A), p_i_A (NTP-based presentation  
> time of i-th RTP packet of stream A)
> Different times for stream B: t_j_B (transmission time, i.e. RTP  
> timestamp of j-th RTP packet of stream B), r_j_B (NTP-based arrival  
> time of j-th RTP packet of stream B), p_j_B (NTP-based presentation  
> time of j-th RTP packet of stream B).
> Therefore, the sync offset between stream A and B can be calculated as:
> - Using presentation times: (p_i_A - t_i_A) - (p_j_B - t_j_B), this  
> gives the end-to-end delay variability
> - Using reception times: (r_i_A - t_i_A) - (r_j_B - t_j_B), this gives  
> the network delay variability
> * Note that we are assuming that RTP timestamps can be mapped to  
> NTP-format timestamps (for the RTP transmission timestamps), based on  
> the correlation timing info included in RTCP SRs.
> Therefore, we think that using presentation timestamps the measurement  
> of the "sync offset" metric will be more accurate (and smoother) than  
> using reception times, because of the variable delays at the  
> distribution and at the receiver sides.

[Qin]: Your proposal is using presentation timestamp which is quite different what
we are currectly proposing in the draft. We will verify your proposal to see which approach
is better.

>> - For both XR blocks defined in this draft, it is stated that: "If the
>> measurement is unavailable, the value of this field with all bits set
>> to 1 SHOULD be reported". But, if the measurements are unavailable,
>> why these XR blocks are needed? Would not it be better to simply not
>> sending these XR blocks?
>> [Qin]: these XR block may be sent in each RTCP report interval, if  
>> we not send them to the receiver of XRBLOCK,
>> the receiver will regard these XRBLOCK are lost, which is not what  
>> we expected.
>> - Finally, we think that this draft should indicate when the proposed
>> XR blocks are sent. Should the RFISD block be sent only once per media
>> session? Should the RFSO block be sent in each RTCP report interval in
>> a compound RTCP packet?
>> [Qin]: For RFISD, both allows, but I think it more makes sense to  
>> send only once per media session.
> [F & M] We assume that these RTCP report blocks will be sent in  
> compound RTCP packets. Regarding the RFSI block, we assume that it is  
> only needed once per session (since it reports on INITIAL Sync Delay,  
> it is not needed during the session lifetime). Regarding the RFSO  
> block, we think that two options could be employed: 1) to send this  
> block in each RTCP report interval, independently of the value of the  
> "sync offset"; 2) to send this block only if the value of the "sync  
> offset" exceeds a configurable allowed asynchrony threshold.
> Therefore, if these XR blocks are not included in the received RTCP  
> compound packet, we think it can be assumed that the metrics these  
> blocks report on are not available, without the need of sending the  
> block reports (because in such a case, these reports do not contain  
> any further statistics).

[Qin]: What about these RTCP report block is not sent in the compound RTCP packet?

> Best Regards,
> Fernando & Mario.