Re: [xrblock] ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01

Mario Montagud Climent <> Mon, 26 November 2012 13:43 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A829821F85BB for <>; Mon, 26 Nov 2012 05:43:38 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.799
X-Spam-Status: No, score=-0.799 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_25=0.6, J_CHICKENPOX_61=0.6, J_CHICKENPOX_71=0.6]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id ngwuZp9pKNA1 for <>; Mon, 26 Nov 2012 05:43:37 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 8509521F8616 for <>; Mon, 26 Nov 2012 05:43:35 -0800 (PST)
Received: from ( []) by (8.13.6/8.13.6) with ESMTP id qAQDhMA9011471 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 26 Nov 2012 14:43:23 +0100
Received: from ( []) by (8.14.3/8.14.3) with ESMTP id qAQDhMEF018069; Mon, 26 Nov 2012 14:43:22 +0100
Received: from localhost ( []) by (8.13.6/8.13.6) with ESMTP id qAQDhKQK025231; Mon, 26 Nov 2012 14:43:20 +0100
Received: from ( []) by (Horde Framework) with HTTP; Mon, 26 Nov 2012 14:43:19 +0100
Message-ID: <>
Date: Mon, 26 Nov 2012 14:43:19 +0100
From: Mario Montagud Climent <>
To: Qin Wu <>
References: <> <> <> <> <>
In-Reply-To: <>
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"; DelSp="Yes"; format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Dynamic Internet Messaging Program (DIMP) H3 (1.1.6)
Subject: Re: [xrblock] ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Metric Blocks for use with RTCP's Extended Report Framework working group discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 26 Nov 2012 13:43:38 -0000

Hi Qin, all,

We were happy to help! :)

See our comments inline.

Qin Wu <> escribió:

> Hi,Mario:
> Thank for your long length comments.:-) I remove the comments we  
> have no issues.
> Please see my reply inline below.
> Regards!
> -Qin
> ----- Original Message -----
> From: "Mario Montagud Climent" <>
> To: "Qin Wu" <>
> Cc: <>; "Huangyihong (Rachel)"  
> <>; "Hitoshi Asaeda" <>
> Sent: Friday, November 09, 2012 3:30 AM
> Subject: Re: ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
>> Hi Qin, Rachel, all,
>> See our comments inline.
>> Qin Wu <> escribió:
>>> Hi,Mario and Fernando:
>>> Thank for your valuable reviews. Let me try to clarify your concerns.
>>> Also please see my reply below inline.
>>> Regards!
>>> -Qin
>>> -----????-----
>>> ???: Mario Montagud Climent []
>>> ????: 2012?11?4? 2:52
>>> ???:
>>> ??: Huangyihong (Rachel); Hitoshi Asaeda; Qin Wu
>>> ??: Comments on draft-ietf-xrblock-rtcp-xr-synchronization-01
>>> Hi all,
>>> We (Fernando Boronat and me) have reviewed the updated version of the
>>> draft-ietf-xrblock-rtcp-xr-synchronization-01 and read the issues
>>> associated to this draft that raised recently in the mailing list.
>>> Here are our comments and suggestions:
>>> Comments regarding the 'Initial Synchronization Delay' Metric:
>>> - We still find a bit confusing the definition of this metric.
>>> We are dealing with INTER-STREAM Synchronization Delay, so we think
>>> the word INTER-STREAM should be added in the definition of this metric
>>> for better clarity.
>>> [Qin]:The Initial Synchronization delay we are using for this metric
>>> is clearly
>>> Specified in RFC6051. The key feature is "Initial" rather than
>>> "inter-stream".
>>> I don't believe it is referred to time different between two stream.
>>> Rather than,
>>> It means how long it take to receive all the components of  
>>> multimedia session
>>> Or layer session. But I agree inter-stream is applied to
>>> synchronization offset metric
>>> We defined in this draft.
>>> As we understand from the RFC 6051, an appropriate definition could
>>> be: "In multimedia streaming services, the (inter-stream)
>>> synchronization delay refers to the time difference between the moment
>>> a user joins a (multicast) multimedia session, probably involving more
>>> than one media streams (e.g., audio and video, or when using layered
>>> and/or multi-description codecs), and the instant when the correlated
>>> media streams can be synchronously presented to that user, i.e. when
>>> RTCP packets (including SDES and SR reports), or when the first RTP
>>> packets with header extensions including in-band synchronization
>>> metadata, have been received on all the involved RTP sessions in the
>>> multimedia session".
>>> [Qin]: Looks good to me except the wording about "inter-stream".
>> [F & M] We didn't refer to the time difference (offset) between
>> different streams, but to the time difference between joining a
>> multimedia session (or a reference RTP session) and the instant at
>> which all involved media streams can be initially presented to the
>> users in a synchronous way. Therefore, we think this metric gives the
>> INITIAL delay for INTER-STREAM synchronization, because all the
>> involved media streams cannot be synchronously presented to the users
>> until the info needed for synchronizing all of them (included in RTCP
>> packets or in RTP header extensions, as specified in RFC 6051) has
>> been received on all the component RTP sessions. This was our
>> rationale for including the term INTER-STREAM in this definition.
> [Qin]: my understanding is the initial synchronization delay we care  
> about is total
> time difference between the first stream is joined and all the  
> streams are syncronized.
> We don't care about the initial synchronization delay between any  
> either two streams
>  in the multimedia session.
> So add "inter-stream" may confuse people we are measuring time  
> difference or synchronization
> offset between two streams.

[F&M] If you think adding "inter-stream" can confuse the readers, this  
is OK for us.
But in compound sessions, we think "inter-stream" sync relates to the  
timing offset between all involved media streams, not only between two  
streams. Anyway, it is up to you.

>>> We understand that the minimization of this metric is important in
>>> multimedia streaming services, e.g. for minimizing zapping delays, but
>>> we would like to see in this draft the utility of this RFISD block.
>>> For example, should the receiver of this report (we assume the media
>>> source) do something when receiving it? Is this RFISD block only used
>>> for informational purposes?
>>> [Qin]: The information in this metric report can be used by the
>>> receiver of this
>>> Report to compare actual initial synchronization delay to targets (i.e., a
>>>    numerical objective or Service Level Agreement) to help ensure the
>>>    quality of real-time application performance.
>> [F&M] We suggest adding a similar paragraph to the draft.
> [Qin]: Okay.
>>> Furthermore, is it expected that all the receivers join the multimedia
>>> session (or the group of RTP sessions) almost simultaneously? For
>>> instance, there could be significant delay differences between the
>>> instants at which different receivers join the same multimedia session
>>> (or the set of RTP sessions). In such a case, the measurement of the
>>> inter-stream synchronization delay should not have the same reference
>>> point for all of the receivers. Do we need some mechanisms to
>>> establish the same reference point or to indicate the exact instant
>>> for the reference point in each receiver?
>>> [Qin]: we allow different receivers report the different initial
>>> synchronization delay.
>>> Since these receiver joins at different time. It doesn't matter
>>> since What we report is per receiver metric.
>> [F & M] Ok. But, in this way, if different receivers, even under
>> similar network conditions (e.g., delays, jitter?), join the session
>> at different instants, but inside the same RTCP report interval (i.e.,
>> between two consecutive RTCP packets sent by the media source for that
>> session), there could be significant differences between the "Initial
>> Sync Delays" reported by each one of them. For example, depending on
>> the joining time, a receiver A could receive the RTCP packets from all
>> the RTP sessions with larger/lower initial delay than another receiver
>> B if the latter joined the session before/later than the former.
>> Even though you only want to "compare actual initial synchronization
>> delay to targets (i.e., a numerical objective or Service Level
>> Agreement)", as pointed out in your previous comment, could the above
>> situation be problematic?
>>> We also think that the terms "start/beginning of session" in the text
>>> should be replaced or better explained.
>>> - We also think that using 1/65536 second units (giving 15 microsecond
>>> accuracy), instead of a 64-bit timestamp, should be accurate enough
>>> for Initial Sync Delay Reporting. But, wouldn?t be more practical and
>>> simpler the use of the same measurement units for both XR blocks?
>>> [Qin]: It looks good to me however what accuracy requirements for
>>> initial sync Delay have we?
>>> Why the accuracy of the initial sync delay we currently define is
>>> not sufficient?
>> [F & M] We think this accuracy is sufficient. Our comment was because
>> of simplicity (this way, both reports could employ the same
>> measurement unit).
> [Qin]: Looks good to me. Let's see what other people think.

[F&M] Ok

>>> - "SSRC of Media Source" -> Shouldn?t this draft specify a policy for
>>> choosing the component SSRC of a multimedia session to report on this
>>> metric? The draft indicates an arbitrary stream, maybe an option
>>> should be the SSRC identifier of a multimedia session with the longest
>>> RTCP reporting interval ...
>>> [Qin]: no, we may report on each media stream that belongs to the
>>> same multimedia session.
>> [F & M] Not sure we are following you here.
>> Must one RFSI block be
>> sent per each SSRC (of each one of the involved RTP sessions) in a
>> multimedia session? Should not be easier to report only on one
>> reference RTP session?
> [Qin]: Sorry, I thought you comment on synchronization offset. I  
> take back what I said here.
> I think you are correct. It is more reasonable to report only on one  
> reference RTP session.

[F&M] Ok :) Thanks!

>> We see different options for selecting the reference RTP session for
>> the "initial sync delay" (i.e. the reference point for this
>> measurement).
>> One option could be selecting the session with the longest RTCP
>> reporting interval, as you proposed. Here, we think the "delay for
>> RTCP reporting interval" concept must be defined accordingly to avoid
>> a possible misleading. On one hand, we can see the delay for the RTCP
>> reporting interval for a media session as the time difference between
>> two successive RTCP packets from this media session. On the other
>> hand, we can also see, from the receiver point of view, the delay for
>> the RTCP reporting interval as the time difference between joining a
>> media session and receiving the first RTCP packet from that media
>> session. In the latter case, the delay for the RTCP reporting interval
>> will depend on the specific joining time for that each receiver (each
>> receiver can join a session in a different instant during the RTCP
>> reporting interval). We propose to clarify this in the draft, if
>> adopted.
>> Another option could be "choosing the time when the first/ last RTP
>> session is joined as the beginning of the multimedia session", as
>> Rachel proposed.
>> We see as a more feasible option to choose the time when a receiver
>> joins the FIRST RTP session of the multimedia session as the
>> starting/reference point for the "initial sync delay" measurements.
>> The use of the RTCP reporting delay for choosing a reference session
>> could be problematic.
>> But, in the way we are proposing, we are assuming that the joining
>> time for the first RTP session can be known for all the other RTP
>> sessions involved in the multimedia session (if we do not want to
>> assume this info can be accessible between the RTP sessions, we will
>> have to assume that the joining time is almost the same for all the
>> RTP sessions).
> [Qin]: Good points, In the meeting, we did discuss this issue, we  
> think choosing RTP session
> with longest interval is not reasonable since report interval may  
> change with session size.
> We believe choosing the time when the first RTP session is joined is  
> a good choice
> for the measurement starting point.

[F&M] Ok

>>> Comments regarding the 'Synchronization Offset' Metric:
>>> - We also agree that reporting synchronization offset per report basis
>>> (instead that for packet basis) can be sufficient.
>>> - We think that the definition of this metric is clearer as it is now
>>> in the draft.
>>> We agree with the importance of minimizing the "sync offset" for
>>> guaranteeing QoE. As specified in RFC 3550, synchronization between
>>> two media streams, i.e. inter-stream synchronization, can be achieved
>>> by using the source identification (i.e. the CNAME item), included in
>>> the SDES reports, and the NTP-RTP timestamps correlation info,
>>> included in the SRs, from the different media streams.
>>> So, as for the other RFISD block, we would like to see the utility of
>>> this block in the draft. Should the receiver of the RFSO block (we
>>> assume the media source/s) do something (e.g. adaption mechanisms)
>>> when receiving it? Is this RFSO block only used for informational
>>> purposes? We think this should be clarified in the draft for both XR
>>> blocks.
>>> [Qin]: I think the information received from RFSo block is
>>>    valuable to network managers in troubleshooting network and user
>>>    experience issues.
>>> - The draft specifies a new XR block for reporting Synchronization
>>> Offset between correlated media streams. One of the streams is
>>> selected as the reference, so we think that some criteria for choosing
>>> that reference (i.e., master) media stream should be added in the
>>> draft. Otherwise, different receivers could select different streams
>>> as the reference one.
>>> Now, the dhe draft indicates (on page 6) that the reference stream
>>> "can be chosen as the arbitrary stream with minimum delay according to
>>> the common criterion defined in section of [Y.1540]".
>>> Using this mechanisms, different receivers could select different
>>> streams as the reference one. Could this be problematic?
>>> [Qin]:Good question, we may choose
>>> the SSRC identifier of one session in multimedia session with the longest
>>> RTCP reporting interval since RFSO deal with multiple sessions that
>>> belong to the same multimedia session.
>> [F & M] Thanks. We are not sure about the suitability of this
>> assumption. We do not see why the "stream with the longest RTCP
>> reporting interval" should be selected as the reference stream for the
>> "sync offset" measurement because of the above discussion. We think
>> other mechanisms should be discussed. Possible options include the
>> most lagged/advanced RTP media streams (i.e. the ones with the
>> highest/lowest reception or presentation, i.e. end-to-end, delays) or
>> a fixed reference stream selected based on other criteria.
> [Qin]: Looks better than my proposal. :-)

[F&M] Ok. Thanks!

>> Besides, the delay for the RTCP reporting interval does not have to be
>> necessarily linked to the delay for the RTP media stream. A media
>> session could have the longest RTCP reporting interval, but the delay
>> for its RTP media stream could be acceptable.
>> For the "sync offset", we think it is better that the reference stream
>> is selected based on the experienced delay for the involved RTP
>> streams, because the sync offset is measured for the RTP streams, and
>> the associated sync adjustments to minimize this offset are also
>> performed on the RTP streams.
> [Qin]: Agree.
>>> - And, finally, in our opinion, a very important issue is:
>>> This draft specifies the Synchronization Offset between correlated
>>> media streams taking into account the arrival times of RTP packets for
>>> the considered streams (see formula on page 8).
>>> Working on reception times could not be enough accurate for use cases
>>> with stringent inter-stream synchronization requirements, especially
>>> when different types of media streams are involved. This is because
>>> the different RTP streams could experience variable delays at the
>>> receiver side, i.e. from the reception instant of RTP packets until
>>> the instant at which the media units (e.g. video frames or audio
>>> samples) included in these RTP packets are played out, mainly due to
>>> different de-packetizing, de-payloading, de-coding, rendering,
>>> processing delays, etc.  So, if we want to report on accurate sync
>>> offset values, we should consider presentation times for the involved
>>> media streams, as in our IDMS draft. Do you think this requirement is
>>> also needed for this draft?
>>> [Qin]: Not sure about this, would you like to clarify how to use
>>> presentation times to calculate sync offset?
>>> Another issue when measuring "sync offset" (per report basis)
>>> considering RTP arrival times is that this measurement can be
>>> significantly affected by the existence of network jitter. As the
>>> streams are sent independently, the RTP packets (or the specific RTP
>>> packet for which this metric is reported) of media stream A could
>>> experience low jitter delays, whilst the RTP packets (or the specific
>>> RTP packet for which this metric is reported) of media stream B could
>>> (sporadically) experience high jitter delays, so this would lead to
>>> the reporting of high and variable "sync offsets" values. So, this
>>> will not provide a smooth, but a variable, measurement.
>>> [Qin]: We have proposed to change RTP time stamp into NTP timestamps.
>> [F & M] In order to use presentation times, we would need to track RTP
>> packets from their arrival to their presentation (or play out) times.
>> This can be seen as a form of layer-violation in some RTP
>> implementations, as previously discussed in the AVTCORE list for our
>> IDMS draft. That is the reason why, in our IDMS draft, reporting on
>> presentation times is optional, but reporting on arrival times is
>> mandatory.
>> But, if presentation times are supported, the sync offset could be
>> easily (and more accurately) calculated than in the current proposal.
>> The calculation is as follows:
>> Different times for stream A: t_i_A (transmission time, i.e. RTP
>> timestamp of i-th RTP packet of stream A), r_i_A (NTP-based arrival
>> time of i-th RTP packet of stream A), p_i_A (NTP-based presentation
>> time of i-th RTP packet of stream A)
>> Different times for stream B: t_j_B (transmission time, i.e. RTP
>> timestamp of j-th RTP packet of stream B), r_j_B (NTP-based arrival
>> time of j-th RTP packet of stream B), p_j_B (NTP-based presentation
>> time of j-th RTP packet of stream B).
>> Therefore, the sync offset between stream A and B can be calculated as:
>> - Using presentation times: (p_i_A - t_i_A) - (p_j_B - t_j_B), this
>> gives the end-to-end delay variability
>> - Using reception times: (r_i_A - t_i_A) - (r_j_B - t_j_B), this gives
>> the network delay variability
>> * Note that we are assuming that RTP timestamps can be mapped to
>> NTP-format timestamps (for the RTP transmission timestamps), based on
>> the correlation timing info included in RTCP SRs.
>> Therefore, we think that using presentation timestamps the measurement
>> of the "sync offset" metric will be more accurate (and smoother) than
>> using reception times, because of the variable delays at the
>> distribution and at the receiver sides.
> [Qin]: Your proposal is using presentation timestamp which is quite  
> different what
> we are currectly proposing in the draft. We will verify your  
> proposal to see which approach
> is better.

[F&M] Ok.

>>> - For both XR blocks defined in this draft, it is stated that: "If the
>>> measurement is unavailable, the value of this field with all bits set
>>> to 1 SHOULD be reported". But, if the measurements are unavailable,
>>> why these XR blocks are needed? Would not it be better to simply not
>>> sending these XR blocks?
>>> [Qin]: these XR block may be sent in each RTCP report interval, if
>>> we not send them to the receiver of XRBLOCK,
>>> the receiver will regard these XRBLOCK are lost, which is not what
>>> we expected.
>>> - Finally, we think that this draft should indicate when the proposed
>>> XR blocks are sent. Should the RFISD block be sent only once per media
>>> session? Should the RFSO block be sent in each RTCP report interval in
>>> a compound RTCP packet?
>>> [Qin]: For RFISD, both allows, but I think it more makes sense to
>>> send only once per media session.
>> [F & M] We assume that these RTCP report blocks will be sent in
>> compound RTCP packets. Regarding the RFSI block, we assume that it is
>> only needed once per session (since it reports on INITIAL Sync Delay,
>> it is not needed during the session lifetime). Regarding the RFSO
>> block, we think that two options could be employed: 1) to send this
>> block in each RTCP report interval, independently of the value of the
>> "sync offset"; 2) to send this block only if the value of the "sync
>> offset" exceeds a configurable allowed asynchrony threshold.
>> Therefore, if these XR blocks are not included in the received RTCP
>> compound packet, we think it can be assumed that the metrics these
>> blocks report on are not available, without the need of sending the
>> block reports (because in such a case, these reports do not contain
>> any further statistics).
> [Qin]: What about these RTCP report block is not sent in the  
> compound RTCP packet?

[F&M] Do you mean sending these reports as immediate and reduced-size  
RTCP packets?

Best Regards,

Fernando & Mario

>> Best Regards,
>> Fernando & Mario.