Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision

I don't think packet size is a primary consideration in this decision. This
issue applies to small and relatively infrequent RTCP packets and not the
larger, more frequent RTP media packets.

Kevin

On Fri, Oct 21, 2011 at 2:32 AM, Qin Wu <bill.wu@huawei.com> wrote:

> Is it important to save packet size in 32bits at the cost of the accuracy.
> It will be good to see the presentation times follow the format of 64bits
> NTP timestamp defined in RFC3550 for SR/RR report
> unless you have a more strong justification for it rather than say thi is
> not ETSI TISPAN standard compliant.
> But I am not judge, this is just my personal opinion.
>
> Regards!
> -Qin
> ----- Original Message -----
> From: "Brandenburg, R. (Ray) van" <ray.vanbrandenburg@tno.nl>
> To: "Qin Wu" <bill.wu@huawei.com>; "Kevin Gross" <kevin.gross@comcast.net>;
> "IETF AVTCore WG" <avt@ietf.org>
> Sent: Friday, October 21, 2011 3:13 PM
> Subject: RE: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision
>
>
> Hi Qin,
>
> Yes. However, the current IDMS draft uses a 64bit NTP timestamp to report
> on Packet Receipt Times and a 32bit NTP timestamp to report on Packet
> Presentation Times (plus a 32bit timestamp for the RTP timestamp). The
> reason we initially went with only a 32bit NTP timestamp for Packet
> Presentation times was that we figured that the 16 most significant bits of
> the NTP time (e.g. the date) would be the same for packet presentation and
> packet reception times and that the least significant 16bits (for sub 15
> microsecond accuracy) would not be necessary. We therefore went with the
> middle 32 bits to limit the length of the IDMS packet.
>
> What Kevin now suggests, is that in some use cases sub-15 microsecond
> accuracy is necessary. He therefore proposes to increase the Packet
> Presentation timestamp to 64bits. The only disadvantage with this approach
> (apart from a slightly larger packet) is that this would break compatibility
> with the ETSI TISPAN standard.
>
> Which way do you think we should go here?
>
> Best Regards,
>
> Ray
>
>
>
> -----Original Message-----
> From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of Qin
> Wu
> Sent: vrijdag 21 oktober 2011 4:48
> To: Kevin Gross; IETF AVTCore WG
> Subject: Re: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision
>
> Isn't 64 bits NTP timestamp and 32bits RTP timestamp compliant with ones
> defined in RFC3550?
>
> Regards!
> -Qin
> ----- Original Message -----
> From: "Kevin Gross" <kevin.gross@comcast.net>
> To: "IETF AVTCore WG" <avt@ietf.org>
> Sent: Thursday, October 20, 2011 10:59 AM
> Subject: [AVTCORE] draft-ietf-avtcore-idms-01 timestamp precision
>
>
> Part of my reason for bringing these use cases to the attention of the
> group relates to a yet unresolved comment I made against the IDMS
> draft.
>
> The draft proposes 64-bit timestamps for arrival times but only 32-bit
> timestamps for presentation times. A 32-bit NTP timestamp has a 15
> microsecond precision. Presentation time synchronization with these
> timestamps would seem not to clearly meet the requirements of a couple
> of these use cases.
>
> Improving the protocol to support 64-bit timestamps throughout seems
> to be the straightforward solution. With this change, all timestamps
> would have the same precision and would have a precision comparable to
> their reference clocks. The problem with this is that our IDMS work is
> based on ETSI TISPAN standard TS 183 063. Efforts have been made in
> the draft to maintain compatibility with this standard. Changing the
> packet format to accommodate 64-bit timestamps will unravel those
> efforts.
>
> If ETSI compatibility is important, we will have to live with 15
> microsecond precision and possibly suggest clever tricks to make the
> most of it.
>
> Which way do we want to go here?
>
> On Tue, Aug 16, 2011 at 11:12 AM, Kevin Gross <kevin.gross@comcast.net>
> wrote:
> > The requirement for the stereo case is specified differently than the
> > others. The 10 microsecond figure there applies to _changes in latency_,
> not
> > static difference in latency described in all the other cases. Tolerance
> for
> > static differences for the stereo case are probably on the order of 1
> > millisecond. The first-order effect of static latency differences is to
> move
> > one of the speakers - just depends on how fussy you are about real vs.
> > effective speaker placement.
> >
> >
> >
> > The 10 microsecond figure is conservative (I've heard anecdotal claims of
> 1
> > microseconds sensitivity) and derives from our ability to accurately
> locate
> > sounds based (in part) on comparison of arrival times at each ear. The 10
> > microsecond figure is not difficult to derive/check. It is also not
> > difficult to set up a listening test to see what your own threshold is.
> > Here's a quick reference on it -
> http://www.cs.ucc.ie/~ianp/CS2511/HAP.html
> > (see Localisation of Sound Sources section)
> >
> >
> >
> > Throughout, I've tried to use what I consider reasonable and verifiable
> > requirements figures. Some will always seem to argue that they need
> better
> > performance.
> >
> >
> >
> > Kevin
> >
> >
> >
> > From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of
> > Allison, Art
> > Sent: Tuesday, August 16, 2011 9:07 AM
> > To: IETF AVTCore WG
> > Subject: Re: [AVTCORE] Additional use cases
> > fordraft-brandenburg-avt-rtcp-for-idms
> >
> >
> >
> > Mr. Gross,
> >
> > The timing assertion for Networked stereo loudspeakers is much tighter
> that
> > I would have expected. Placement of stereo speakers within a foot is a
> > normal rule of thumb for the casual listener - and that is about 1.1ms
> > delta at room temperature. The ' golden ears' do have ability to detect
> > small differences especially with high attack waveforms, but I would like
> to
> > see the study that showed this small time difference. Can you please
> provide
> > a cite?
> >
> > And I wonder if the system requirement should be based on the smallest
> known
> > delta, unless there is no implementation complexity added to convey the
> > suggested resolution.
> >
> >
> >
> >
> >
> > Art Allison
> > Senior Director Advanced Engineering, Science and Technology
> > National Association of Broadcasters
> > 1771 N Street NW
> > Washington, DC 20036
> > Phone 202 429 5418
> > Fax 202 775 4981
> > www.nab.org
> > Advocacy Education Innovation
> >
> > From: avt-bounces@ietf.org [mailto:avt-bounces@ietf.org] On Behalf Of
> Kevin
> > Gross
> > Sent: Monday, August 15, 2011 11:48 PM
> > To: 'IETF AVTCore WG'
> > Subject: [AVTCORE] Additional use cases
> > fordraft-brandenburg-avt-rtcp-for-idms
> >
> >
> >
> > In reviewing the IDMS draft I had volunteered to contribute additional
> use
> > cases. Ray Brandenberg suggested that I post them here.
> >
> >
> >
> > Kevin Gross
> >
> > AVA Networks
> >
> > Networked stereo loudspeakers
> >
> > Because of our ability to localize sound based on inter-aural time
> > differences, in a stereo listening situation, we are very sensitive to
> > changes in latency between the two speakers. These changes are perceived
> as
> > a shift in or instability of the "sound stage" during critical listening.
> > These effects are readily noticeable with shifts of 10 microseconds or
> > smaller. If the individual speakers in a stereo listening setup operate
> from
> > independent network interfaces any changing difference in latency between
> > the two speakers greater than 10 microseconds will be detrimental to the
> > listening experience.
> >
> > Conferencing sound reinforcement system
> >
> > A conferencing sound reinforcement system is used in commercial and
> > government installations such as legislative chambers, courtrooms,
> > boardrooms, classrooms (especially those supporting distance learning)
> and
> > other such venues. Each participant using such a system has a microphone
> and
> > a speaker. There may also be other speakers to provide reinforcement for
> > non-speaking participants such as in an audience area or jury box. Each
> > microphone/speaker pair is individually connected to a network and
> transmits
> > digital audio to the other devices through the network and receives
> digital
> > audio to be reproduced through the speaker over the network. There may be
> a
> > central appliance which receives, prioritizes and mixes the microphone
> > signals. In some systems an individual mix is created for each speaker
> such
> > that a speaker's own voice does not come out from his speaker or from
> those
> > immediately surrounding him.
> >
> > The objective of these systems is to provide enough gain to enhance
> > intelligibility but not so much that the speaker sounds or feels
> amplified.
> > Meeting this objective helps insure that natural person-to-person
> > communication is retained. To this end, it is desirable that the sound
> > through the system and from the speakers arrive 5 to 30 milliseconds
> after
> > the the sound arriving through the air from the person speaking. Delays
> in
> > this range invoke the Haas effect which allows listeners to locate the
> > person speaking based on the sound arriving through the air while the
> sound
> > reinforcement system provides the additional gain required to achieve
> > desired intelligibility. It is also desirable for the sound to come out
> of
> > nearby speakers at within 5 milliseconds as longer differential delays
> will
> > be perceived as reverberation or echo.
> >
> > Video wall
> >
> > A video wall consists of multiple computer monitors, video projectors, or
> > television sets tiled together contiguously or overlapped in order to
> form
> > one large screen.# Each of the screens reproduces a portion of the larger
> > picture. In some implementations, each screen may be individually
> connected
> > to the network and receive its portion of the overall image from a
> > network-connected video server or video scaler. Screens are refreshed at
> 60
> > hertz (every 16-2/3 milliseconds) or potentially faster. If the refresh
> is
> > not synchronized, the effect of multiple screens acting as one is broken.
> >
> > Phased array transducers
> >
> > Phased array and wave field synthesis techniques are increasingly used in
> > audio applications. These techniques work by sending or receiving
> slightly
> > different versions of a signal in a spacial sampling arrangement to
> produce
> > or record spacial and directional sound fields. The individual
> transducers
> > in these applications can be extremely sensitive to differential latency.
> > Example applications include conferencing microphone systems able to
> > electronically aim at the person speaking to improve signal to noise
> ratio.
> > These microphones are also able to report the location of the speaker for
> > purposes of automatically aiming a video camera at them.
> >
> > Concert sound systems called line arrays allow technicians control over
> the
> > amount of sound sent to different places. People in the front of the
> > audience can have the same loudness as those in the back. By preventing
> > sound from reaching the roof and back wall of the performance space, the
> > amount of reflected sound heard by the audience is reduced and the
> listening
> > experience is improved.
> >
> > In these systems, accuracy in locating or emitting sound is related to
> > differential latency through basic trigonometry. In these applications,
> > microseconds of differential latency can translate to degrees of
> > uncertainty. Accuracy greater than the audio sample period (about 20
> > microseconds for professional 48 kHz sample rate) is generally desired.
> >
> >
> _______________________________________________
> Audio/Video Transport Core Maintenance
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt
> _______________________________________________
> Audio/Video Transport Core Maintenance
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt
> This e-mail and its contents are subject to the DISCLAIMER at
> http://www.tno.nl/emaildisclaimer
>
> _______________________________________________
> Audio/Video Transport Core Maintenance
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt
>