Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

"Brandenburg, R. (Ray) van" <> Mon, 24 October 2011 15:28 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 94FD821F8D3E for <>; Mon, 24 Oct 2011 08:28:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -0.504
X-Spam-Status: No, score=-0.504 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HELO_EQ_NL=0.55, HOST_EQ_NL=1.545]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id WIaZD-ZT6gpj for <>; Mon, 24 Oct 2011 08:28:12 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 70BB121F8D13 for <>; Mon, 24 Oct 2011 08:28:11 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="4.69,399,1315173600"; d="scan'208";a="13576395"
Received: from unknown (HELO ([]) by with ESMTP; 24 Oct 2011 17:28:10 +0200
Received: from ([]) by ([]) with mapi id 14.01.0323.003; Mon, 24 Oct 2011 17:28:09 +0200
From: "Brandenburg, R. (Ray) van" <>
To: Qin Wu <>, Kevin Gross <>, IETF AVTCore WG <>
Thread-Topic: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision
Thread-Index: AQHMj2AZ1xwxk8RC5UaH+/0nn1Dcj5WGGa8tgABHnZCAABhWuIAADDrAgAAGcVSAAAAa4IAFFSqg
Date: Mon, 24 Oct 2011 15:28:08 +0000
Message-ID: <>
References: <> <> <> <> <> <> <>
In-Reply-To: <>
Accept-Language: en-US, nl-NL
Content-Language: en-US
x-originating-ip: []
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 24 Oct 2011 15:28:13 -0000


In order to solve both the ETSI compatibility issue as well as the need for high precision 64bit timestamps, Kevin and I have come up with the following solution:

The current draft describes both an IDMS XR block (for SC->MSAS communication) as well as an RTCP IDMS Packet Type (for MSAS->SC communication). I would like to propose that we change the length of the Presentation Timestamp in the IDMS Packet Type to 64bit, while keeping the length of the Presentation Timestamp in the IDMS XR block to 32bits. This way the MSAS is able to send very precise control commands to the synchronization clients while the SC->MSAS measurements can remain compatible with ETSI. Since the MSAS will add additional safety to the measured packet presentation times anyway (e.g. to allow for jitter), it is not necessary to have 64bit precision measurements available on the MSAS. 

Do you agree with this solution?


-----Original Message-----
From: [] On Behalf Of Brandenburg, R. (Ray) van
Sent: vrijdag 21 oktober 2011 11:45
To: Qin Wu; Kevin Gross; IETF AVTCore WG
Subject: Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Ok. I will work together with Kevin to create a new version of the draft.

In the meantime we have to try to come to consensus on the Initial Playout Sync proposal of Fernando Boronat so that we can post a new ID before the next meeting deadline. 

If I understand correctly, the main question here is the matter in which the delay between Media Sender en SC is determined. 


-----Original Message-----
From: Qin Wu [] 
Sent: vrijdag 21 oktober 2011 11:37
To: Brandenburg, R. (Ray) van; Kevin Gross; IETF AVTCore WG
Subject: Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

----- Original Message ----- 
From: "Brandenburg, R. (Ray) van" <>
To: "Qin Wu" <>om>; "Kevin Gross" <>et>; "IETF AVTCore WG" <>
Sent: Friday, October 21, 2011 5:19 PM
Subject: RE: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Hi Qin,

Thanks for comments. 

> Is it important to save packet size in 32bits at the cost of the accuracy.

> It will be good to see the presentation times follow the format of 64bits NTP timestamp defined in RFC3550 for SR/RR report unless you have a more strong justification for it rather than say thi is not ETSI TISPAN standard compliant.

These statements seem to be contradictory. Do you mean you want to see the presentation field extended to 64bits, or do you want to keep it at 32bits to save packet length?

[Qin]: I prefer to the former.

-----Original Message-----
From: Qin Wu [] 
Sent: vrijdag 21 oktober 2011 10:33
To: Brandenburg, R. (Ray) van; Kevin Gross; IETF AVTCore WG
Subject: Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Is it important to save packet size in 32bits at the cost of the accuracy.
It will be good to see the presentation times follow the format of 64bits NTP timestamp defined in RFC3550 for SR/RR report unless you have a more strong justification for it rather than say thi is not ETSI TISPAN standard compliant.
But I am not judge, this is just my personal opinion.

----- Original Message -----
From: "Brandenburg, R. (Ray) van" <>
To: "Qin Wu" <>om>; "Kevin Gross" <>et>; "IETF AVTCore WG" <>
Sent: Friday, October 21, 2011 3:13 PM
Subject: RE: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Hi Qin,

Yes. However, the current IDMS draft uses a 64bit NTP timestamp to report on Packet Receipt Times and a 32bit NTP timestamp to report on Packet Presentation Times (plus a 32bit timestamp for the RTP timestamp). The reason we initially went with only a 32bit NTP timestamp for Packet Presentation times was that we figured that the 16 most significant bits of the NTP time (e.g. the date) would be the same for packet presentation and packet reception times and that the least significant 16bits (for sub 15 microsecond accuracy) would not be necessary. We therefore went with the middle 32 bits to limit the length of the IDMS packet. 

What Kevin now suggests, is that in some use cases sub-15 microsecond accuracy is necessary. He therefore proposes to increase the Packet Presentation timestamp to 64bits. The only disadvantage with this approach (apart from a slightly larger packet) is that this would break compatibility with the ETSI TISPAN standard.

Which way do you think we should go here?

Best Regards,


-----Original Message-----
From: [] On Behalf Of Qin Wu
Sent: vrijdag 21 oktober 2011 4:48
To: Kevin Gross; IETF AVTCore WG
Subject: Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Isn't 64 bits NTP timestamp and 32bits RTP timestamp compliant with ones defined in RFC3550?

----- Original Message -----
From: "Kevin Gross" <>
To: "IETF AVTCore WG" <>
Sent: Thursday, October 20, 2011 10:59 AM
Subject: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Part of my reason for bringing these use cases to the attention of the
group relates to a yet unresolved comment I made against the IDMS

The draft proposes 64-bit timestamps for arrival times but only 32-bit
timestamps for presentation times. A 32-bit NTP timestamp has a 15
microsecond precision. Presentation time synchronization with these
timestamps would seem not to clearly meet the requirements of a couple
of these use cases.

Improving the protocol to support 64-bit timestamps throughout seems
to be the straightforward solution. With this change, all timestamps
would have the same precision and would have a precision comparable to
their reference clocks. The problem with this is that our IDMS work is
based on ETSI TISPAN standard TS 183 063. Efforts have been made in
the draft to maintain compatibility with this standard. Changing the
packet format to accommodate 64-bit timestamps will unravel those

If ETSI compatibility is important, we will have to live with 15
microsecond precision and possibly suggest clever tricks to make the
most of it.

Which way do we want to go here?

On Tue, Aug 16, 2011 at 11:12 AM, Kevin Gross <> wrote:
> The requirement for the stereo case is specified differently than the
> others. The 10 microsecond figure there applies to _changes in latency_, not
> static difference in latency described in all the other cases. Tolerance for
> static differences for the stereo case are probably on the order of 1
> millisecond. The first-order effect of static latency differences is to move
> one of the speakers - just depends on how fussy you are about real vs.
> effective speaker placement.
> The 10 microsecond figure is conservative (I've heard anecdotal claims of 1
> microseconds sensitivity) and derives from our ability to accurately locate
> sounds based (in part) on comparison of arrival times at each ear. The 10
> microsecond figure is not difficult to derive/check. It is also not
> difficult to set up a listening test to see what your own threshold is.
> Here's a quick reference on it -
> (see Localisation of Sound Sources section)
> Throughout, I've tried to use what I consider reasonable and verifiable
> requirements figures. Some will always seem to argue that they need better
> performance.
> Kevin
> From: [] On Behalf Of
> Allison, Art
> Sent: Tuesday, August 16, 2011 9:07 AM
> Subject: Re: [AVTCORE] Additional use cases
> fordraft-brandenburg-avt-rtcp-for-idms
> Mr. Gross,
> The timing assertion for Networked stereo loudspeakers is much tighter that
> I would have expected. Placement of stereo speakers within a foot is a
> normal rule of thumb for the casual listener - and that is about 1.1ms
> delta at room temperature. The ' golden ears' do have ability to detect
> small differences especially with high attack waveforms, but I would like to
> see the study that showed this small time difference. Can you please provide
> a cite?
> And I wonder if the system requirement should be based on the smallest known
> delta, unless there is no implementation complexity added to convey the
> suggested resolution.
> Art Allison
> Senior Director Advanced Engineering, Science and Technology
> National Association of Broadcasters
> 1771 N Street NW
> Washington, DC 20036
> Phone 202 429 5418
> Fax 202 775 4981
> Advocacy Education Innovation
> From: [] On Behalf Of Kevin
> Gross
> Sent: Monday, August 15, 2011 11:48 PM
> To: 'IETF AVTCore WG'
> Subject: [AVTCORE] Additional use cases
> fordraft-brandenburg-avt-rtcp-for-idms
> In reviewing the IDMS draft I had volunteered to contribute additional use
> cases. Ray Brandenberg suggested that I post them here.
> Kevin Gross
> AVA Networks
> Networked stereo loudspeakers
> Because of our ability to localize sound based on inter-aural time
> differences, in a stereo listening situation, we are very sensitive to
> changes in latency between the two speakers. These changes are perceived as
> a shift in or instability of the "sound stage" during critical listening.
> These effects are readily noticeable with shifts of 10 microseconds or
> smaller. If the individual speakers in a stereo listening setup operate from
> independent network interfaces any changing difference in latency between
> the two speakers greater than 10 microseconds will be detrimental to the
> listening experience.
> Conferencing sound reinforcement system
> A conferencing sound reinforcement system is used in commercial and
> government installations such as legislative chambers, courtrooms,
> boardrooms, classrooms (especially those supporting distance learning) and
> other such venues. Each participant using such a system has a microphone and
> a speaker. There may also be other speakers to provide reinforcement for
> non-speaking participants such as in an audience area or jury box. Each
> microphone/speaker pair is individually connected to a network and transmits
> digital audio to the other devices through the network and receives digital
> audio to be reproduced through the speaker over the network. There may be a
> central appliance which receives, prioritizes and mixes the microphone
> signals. In some systems an individual mix is created for each speaker such
> that a speaker's own voice does not come out from his speaker or from those
> immediately surrounding him.
> The objective of these systems is to provide enough gain to enhance
> intelligibility but not so much that the speaker sounds or feels amplified.
> Meeting this objective helps insure that natural person-to-person
> communication is retained. To this end, it is desirable that the sound
> through the system and from the speakers arrive 5 to 30 milliseconds after
> the the sound arriving through the air from the person speaking. Delays in
> this range invoke the Haas effect which allows listeners to locate the
> person speaking based on the sound arriving through the air while the sound
> reinforcement system provides the additional gain required to achieve
> desired intelligibility. It is also desirable for the sound to come out of
> nearby speakers at within 5 milliseconds as longer differential delays will
> be perceived as reverberation or echo.
> Video wall
> A video wall consists of multiple computer monitors, video projectors, or
> television sets tiled together contiguously or overlapped in order to form
> one large screen.# Each of the screens reproduces a portion of the larger
> picture. In some implementations, each screen may be individually connected
> to the network and receive its portion of the overall image from a
> network-connected video server or video scaler. Screens are refreshed at 60
> hertz (every 16-2/3 milliseconds) or potentially faster. If the refresh is
> not synchronized, the effect of multiple screens acting as one is broken.
> Phased array transducers
> Phased array and wave field synthesis techniques are increasingly used in
> audio applications. These techniques work by sending or receiving slightly
> different versions of a signal in a spacial sampling arrangement to produce
> or record spacial and directional sound fields. The individual transducers
> in these applications can be extremely sensitive to differential latency.
> Example applications include conferencing microphone systems able to
> electronically aim at the person speaking to improve signal to noise ratio.
> These microphones are also able to report the location of the speaker for
> purposes of automatically aiming a video camera at them.
> Concert sound systems called line arrays allow technicians control over the
> amount of sound sent to different places. People in the front of the
> audience can have the same loudness as those in the back. By preventing
> sound from reaching the roof and back wall of the performance space, the
> amount of reflected sound heard by the audience is reduced and the listening
> experience is improved.
> In these systems, accuracy in locating or emitting sound is related to
> differential latency through basic trigonometry. In these applications,
> microseconds of differential latency can translate to degrees of
> uncertainty. Accuracy greater than the audio sample period (about 20
> microseconds for professional 48 kHz sample rate) is generally desired.
Audio/Video Transport Core Maintenance
Audio/Video Transport Core Maintenance
This e-mail and its contents are subject to the DISCLAIMER at
Audio/Video Transport Core Maintenance