Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Isn't 64 bits NTP timestamp and 32bits RTP timestamp compliant with 
ones defined in RFC3550?

Regards!
-Qin
----- Original Message ----- 
From: "Kevin Gross" <kevin.gross@comcast.net>
To: "IETF AVTCore WG" <avt@ietf.org>
Sent: Thursday, October 20, 2011 10:59 AM
Subject: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Part of my reason for bringing these use cases to the attention of the
group relates to a yet unresolved comment I made against the IDMS
draft.

The draft proposes 64-bit timestamps for arrival times but only 32-bit
timestamps for presentation times. A 32-bit NTP timestamp has a 15
microsecond precision. Presentation time synchronization with these
timestamps would seem not to clearly meet the requirements of a couple
of these use cases.

Improving the protocol to support 64-bit timestamps throughout seems
to be the straightforward solution. With this change, all timestamps
would have the same precision and would have a precision comparable to
their reference clocks. The problem with this is that our IDMS work is
based on ETSI TISPAN standard TS 183 063. Efforts have been made in
the draft to maintain compatibility with this standard. Changing the
packet format to accommodate 64-bit timestamps will unravel those
efforts.

If ETSI compatibility is important, we will have to live with 15
microsecond precision and possibly suggest clever tricks to make the
most of it.

Which way do we want to go here?

On Tue, Aug 16, 2011 at 11:12 AM, Kevin Gross <kevin.gross@comcast.net> wrote:
> The requirement for the stereo case is specified differently than the
> others. The 10 microsecond figure there applies to _changes in latency_, not
> static difference in latency described in all the other cases. Tolerance for
> static differences for the stereo case are probably on the order of 1
> millisecond. The first-order effect of static latency differences is to move
> one of the speakers – just depends on how fussy you are about real vs.
> effective speaker placement.
>
>
>
> The 10 microsecond figure is conservative (I’ve heard anecdotal claims of 1
> microseconds sensitivity) and derives from our ability to accurately locate
> sounds based (in part) on comparison of arrival times at each ear. The 10
> microsecond figure is not difficult to derive/check. It is also not
> difficult to set up a listening test to see what your own threshold is.
> Here’s a quick reference on it - http://www.cs.ucc.ie/~ianp/CS2511/HAP.html
> (see Localisation of Sound Sources section)
>
>
>
> Throughout, I’ve tried to use what I consider reasonable and verifiable
> requirements figures. Some will always seem to argue that they need better
> performance.
>
>
>
> Kevin
>
>
>
> From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of
> Allison, Art
> Sent: Tuesday, August 16, 2011 9:07 AM
> To: IETF AVTCore WG
> Subject: Re: [AVTCORE] Additional use cases
> fordraft-brandenburg-avt-rtcp-for-idms
>
>
>
> Mr. Gross,
>
> The timing assertion for Networked stereo loudspeakers is much tighter that
> I would have expected. Placement of stereo speakers within a foot is a
> normal rule of thumb for the casual listener – and that is about 1.1ms
> delta at room temperature. The ‘ golden ears’ do have ability to detect
> small differences especially with high attack waveforms, but I would like to
> see the study that showed this small time difference. Can you please provide
> a cite?
>
> And I wonder if the system requirement should be based on the smallest known
> delta, unless there is no implementation complexity added to convey the
> suggested resolution.
>
>
>
>
>
> Art Allison
> Senior Director Advanced Engineering, Science and Technology
> National Association of Broadcasters
> 1771 N Street NW
> Washington, DC 20036
> Phone 202 429 5418
> Fax 202 775 4981
> www.nab.org
> Advocacy Education Innovation
>
> From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of Kevin
> Gross
> Sent: Monday, August 15, 2011 11:48 PM
> To: 'IETF AVTCore WG'
> Subject: [AVTCORE] Additional use cases
> fordraft-brandenburg-avt-rtcp-for-idms
>
>
>
> In reviewing the IDMS draft I had volunteered to contribute additional use
> cases. Ray Brandenberg suggested that I post them here.
>
>
>
> Kevin Gross
>
> AVA Networks
>
> Networked stereo loudspeakers
>
> Because of our ability to localize sound based on inter-aural time
> differences, in a stereo listening situation, we are very sensitive to
> changes in latency between the two speakers. These changes are perceived as
> a shift in or instability of the “sound stage” during critical listening.
> These effects are readily noticeable with shifts of 10 microseconds or
> smaller. If the individual speakers in a stereo listening setup operate from
> independent network interfaces any changing difference in latency between
> the two speakers greater than 10 microseconds will be detrimental to the
> listening experience.
>
> Conferencing sound reinforcement system
>
> A conferencing sound reinforcement system is used in commercial and
> government installations such as legislative chambers, courtrooms,
> boardrooms, classrooms (especially those supporting distance learning) and
> other such venues. Each participant using such a system has a microphone and
> a speaker. There may also be other speakers to provide reinforcement for
> non-speaking participants such as in an audience area or jury box. Each
> microphone/speaker pair is individually connected to a network and transmits
> digital audio to the other devices through the network and receives digital
> audio to be reproduced through the speaker over the network. There may be a
> central appliance which receives, prioritizes and mixes the microphone
> signals. In some systems an individual mix is created for each speaker such
> that a speaker’s own voice does not come out from his speaker or from those
> immediately surrounding him.
>
> The objective of these systems is to provide enough gain to enhance
> intelligibility but not so much that the speaker sounds or feels amplified.
> Meeting this objective helps insure that natural person-to-person
> communication is retained. To this end, it is desirable that the sound
> through the system and from the speakers arrive 5 to 30 milliseconds after
> the the sound arriving through the air from the person speaking. Delays in
> this range invoke the Haas effect which allows listeners to locate the
> person speaking based on the sound arriving through the air while the sound
> reinforcement system provides the additional gain required to achieve
> desired intelligibility. It is also desirable for the sound to come out of
> nearby speakers at within 5 milliseconds as longer differential delays will
> be perceived as reverberation or echo.
>
> Video wall
>
> A video wall consists of multiple computer monitors, video projectors, or
> television sets tiled together contiguously or overlapped in order to form
> one large screen.# Each of the screens reproduces a portion of the larger
> picture. In some implementations, each screen may be individually connected
> to the network and receive its portion of the overall image from a
> network-connected video server or video scaler. Screens are refreshed at 60
> hertz (every 16-2/3 milliseconds) or potentially faster. If the refresh is
> not synchronized, the effect of multiple screens acting as one is broken.
>
> Phased array transducers
>
> Phased array and wave field synthesis techniques are increasingly used in
> audio applications. These techniques work by sending or receiving slightly
> different versions of a signal in a spacial sampling arrangement to produce
> or record spacial and directional sound fields. The individual transducers
> in these applications can be extremely sensitive to differential latency.
> Example applications include conferencing microphone systems able to
> electronically aim at the person speaking to improve signal to noise ratio.
> These microphones are also able to report the location of the speaker for
> purposes of automatically aiming a video camera at them.
>
> Concert sound systems called line arrays allow technicians control over the
> amount of sound sent to different places. People in the front of the
> audience can have the same loudness as those in the back. By preventing
> sound from reaching the roof and back wall of the performance space, the
> amount of reflected sound heard by the audience is reduced and the listening
> experience is improved.
>
> In these systems, accuracy in locating or emitting sound is related to
> differential latency through basic trigonometry. In these applications,
> microseconds of differential latency can translate to degrees of
> uncertainty. Accuracy greater than the audio sample period (about 20
> microseconds for professional 48 kHz sample rate) is generally desired.
>
>
_______________________________________________
Audio/Video Transport Core Maintenance
avt@ietf.org
https://www.ietf.org/mailman/listinfo/avt