[AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

Kevin Gross <kevin.gross@comcast.net> Thu, 20 October 2011 03:00 UTC

Return-Path: <kevin.gross@comcast.net>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 80D1011E80D1 for <avt@ietfa.amsl.com>; Wed, 19 Oct 2011 20:00:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -101.977
X-Spam-Status: No, score=-101.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id ag1+Tk-lzXgk for <avt@ietfa.amsl.com>; Wed, 19 Oct 2011 20:00:02 -0700 (PDT)
Received: from qmta06.westchester.pa.mail.comcast.net (qmta06.westchester.pa.mail.comcast.net []) by ietfa.amsl.com (Postfix) with ESMTP id F1E5B11E80C2 for <avt@ietf.org>; Wed, 19 Oct 2011 20:00:01 -0700 (PDT)
Received: from omta20.westchester.pa.mail.comcast.net ([]) by qmta06.westchester.pa.mail.comcast.net with comcast id mqyN1h0011YDfWL56r02da; Thu, 20 Oct 2011 03:00:02 +0000
Received: from mail-ey0-f172.google.com ([]) by omta20.westchester.pa.mail.comcast.net with comcast id mr011h00l3jky9M3gr02KW; Thu, 20 Oct 2011 03:00:02 +0000
Received: by eyg24 with SMTP id 24so2501884eyg.31 for <avt@ietf.org>; Wed, 19 Oct 2011 19:59:59 -0700 (PDT)
MIME-Version: 1.0
Received: by with SMTP id c11mr15171760fah.24.1319079599282; Wed, 19 Oct 2011 19:59:59 -0700 (PDT)
Received: by with HTTP; Wed, 19 Oct 2011 19:59:59 -0700 (PDT)
Date: Wed, 19 Oct 2011 20:59:59 -0600
Message-ID: <CALw1_Q31aOWq0i5HzCoBEqCWM53JW7a6FrntJ9LcUTm2UJFujA@mail.gmail.com>
From: Kevin Gross <kevin.gross@comcast.net>
To: IETF AVTCore WG <avt@ietf.org>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Mailman-Approved-At: Thu, 20 Oct 2011 12:39:32 -0700
Subject: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/avt>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2011 03:14:23 -0000

Part of my reason for bringing these use cases to the attention of the
group relates to a yet unresolved comment I made against the IDMS

The draft proposes 64-bit timestamps for arrival times but only 32-bit
timestamps for presentation times. A 32-bit NTP timestamp has a 15
microsecond precision. Presentation time synchronization with these
timestamps would seem not to clearly meet the requirements of a couple
of these use cases.

Improving the protocol to support 64-bit timestamps throughout seems
to be the straightforward solution. With this change, all timestamps
would have the same precision and would have a precision comparable to
their reference clocks. The problem with this is that our IDMS work is
based on ETSI TISPAN standard TS 183 063. Efforts have been made in
the draft to maintain compatibility with this standard. Changing the
packet format to accommodate 64-bit timestamps will unravel those

If ETSI compatibility is important, we will have to live with 15
microsecond precision and possibly suggest clever tricks to make the
most of it.

Which way do we want to go here?

On Tue, Aug 16, 2011 at 11:12 AM, Kevin Gross <kevin.gross@comcast.net> wrote:
> The requirement for the stereo case is specified differently than the
> others. The 10 microsecond figure there applies to _changes in latency_, not
> static difference in latency described in all the other cases. Tolerance for
> static differences for the stereo case are probably on the order of 1
> millisecond. The first-order effect of static latency differences is to move
> one of the speakers – just depends on how fussy you are about real vs.
> effective speaker placement.
> The 10 microsecond figure is conservative (I’ve heard anecdotal claims of 1
> microseconds sensitivity) and derives from our ability to accurately locate
> sounds based (in part) on comparison of arrival times at each ear. The 10
> microsecond figure is not difficult to derive/check. It is also not
> difficult to set up a listening test to see what your own threshold is.
> Here’s a quick reference on it - http://www.cs.ucc.ie/~ianp/CS2511/HAP.html
> (see Localisation of Sound Sources section)
> Throughout, I’ve tried to use what I consider reasonable and verifiable
> requirements figures. Some will always seem to argue that they need better
> performance.
> Kevin
> From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of
> Allison, Art
> Sent: Tuesday, August 16, 2011 9:07 AM
> Subject: Re: [AVTCORE] Additional use cases
> fordraft-brandenburg-avt-rtcp-for-idms
> Mr. Gross,
> The timing assertion for Networked stereo loudspeakers is much tighter that
> I would have expected. Placement of stereo speakers within a foot is a
> normal rule of thumb for the casual listener – and that  is about 1.1ms
> delta at room temperature. The ‘ golden ears’ do have ability to detect
> small differences especially with high attack waveforms, but I would like to
> see the study that showed this small time difference. Can you please provide
> a cite?
> And I wonder if the system requirement should be based on the smallest known
> delta, unless there is no implementation complexity added to convey the
> suggested resolution.
> Art Allison
> Senior Director Advanced Engineering, Science and Technology
> National Association of Broadcasters
> 1771 N Street NW
> Washington, DC 20036
> Phone  202 429 5418
> Fax  202 775 4981
> www.nab.org
> Advocacy  Education  Innovation
> From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of Kevin
> Gross
> Sent: Monday, August 15, 2011 11:48 PM
> To: 'IETF AVTCore WG'
> Subject: [AVTCORE] Additional use cases
> fordraft-brandenburg-avt-rtcp-for-idms
> In reviewing the IDMS draft I had volunteered to contribute additional use
> cases. Ray Brandenberg suggested that I post them here.
> Kevin Gross
> AVA Networks
> Networked stereo loudspeakers
> Because of our ability to localize sound based on inter-aural time
> differences, in a stereo listening situation, we are very sensitive to
> changes in latency between the two speakers. These changes are perceived as
> a shift in or instability of the “sound stage” during critical listening.
> These effects are readily noticeable with shifts of 10 microseconds or
> smaller. If the individual speakers in a stereo listening setup operate from
> independent network interfaces any changing difference in latency between
> the two speakers greater than 10 microseconds will be detrimental to the
> listening experience.
> Conferencing sound reinforcement system
> A conferencing sound reinforcement system is used in commercial and
> government installations such as legislative chambers, courtrooms,
> boardrooms, classrooms (especially those supporting distance learning) and
> other such venues. Each participant using such a system has a microphone and
> a speaker. There may also be other speakers to provide reinforcement for
> non-speaking participants such as in an audience area or jury box. Each
> microphone/speaker pair is individually connected to a network and transmits
> digital audio to the other devices through the network and receives digital
> audio to be reproduced through the speaker over the network. There may be a
> central appliance which receives, prioritizes and mixes the microphone
> signals. In some systems an individual mix is created for each speaker such
> that a speaker’s own voice does not come out from his speaker or from those
> immediately surrounding him.
> The objective of these systems is to provide enough gain to enhance
> intelligibility but not so much that the speaker sounds or feels amplified.
> Meeting this objective helps insure that natural person-to-person
> communication is retained. To this end, it is desirable that the sound
> through the system and from the speakers arrive 5 to 30 milliseconds after
> the the sound arriving through the air from the person speaking. Delays in
> this range invoke the Haas effect which allows listeners to locate the
> person speaking based on the sound arriving through the air while the sound
> reinforcement system provides the additional gain required to achieve
> desired intelligibility. It is also desirable for the sound to come out of
> nearby speakers at within 5 milliseconds as longer differential delays will
> be perceived as reverberation or echo.
> Video wall
> A video wall consists of multiple computer monitors, video projectors, or
> television sets tiled together contiguously or overlapped in order to form
> one large screen.# Each of the screens reproduces a portion of the larger
> picture. In some implementations, each screen may be individually connected
> to the network and receive its portion of the overall image from a
> network-connected video server or video scaler. Screens are refreshed at 60
> hertz (every 16-2/3 milliseconds) or potentially faster. If the refresh is
> not synchronized, the effect of multiple screens acting as one is broken.
> Phased array transducers
> Phased array and wave field synthesis techniques are increasingly used in
> audio applications. These techniques work by sending or receiving slightly
> different versions of a signal in a spacial sampling arrangement to produce
> or record spacial and directional sound fields. The individual transducers
> in these applications can be extremely sensitive to differential latency.
> Example applications include conferencing microphone systems able to
> electronically aim at the person speaking to improve signal to noise ratio.
> These microphones are also able to report the location of the speaker for
> purposes of automatically aiming a video camera at them.
> Concert sound systems called line arrays allow technicians control over the
> amount of sound sent to different places. People in the front of the
> audience can have the same loudness as those in the back. By preventing
> sound from reaching the roof and back wall of the performance space, the
> amount of reflected sound heard by the audience is reduced and the listening
> experience is improved.
> In these systems, accuracy in locating or emitting sound is related to
> differential latency through basic trigonometry. In these applications,
> microseconds of differential latency can translate to degrees of
> uncertainty. Accuracy greater than the audio sample period (about 20
> microseconds for professional 48 kHz sample rate) is generally desired.