Re: [AVT] VMR-WB RTP Payload and Storage Formats

Randell Jesup <rjesup@wgate.com> Mon, 04 October 2004 19:56 UTC

Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA09635 for <avt-archive@ietf.org>; Mon, 4 Oct 2004 15:56:52 -0400 (EDT)
Received: from megatron.ietf.org ([132.151.6.71]) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CEZ6D-0004mo-Ld for avt-archive@ietf.org; Mon, 04 Oct 2004 16:06:22 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=megatron.ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CEYiw-0006K6-Op; Mon, 04 Oct 2004 15:42:18 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by megatron.ietf.org with esmtp (Exim 4.32) id 1CEYdy-0003YX-JU for avt@megatron.ietf.org; Mon, 04 Oct 2004 15:37:10 -0400
Received: from ietf-mx.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id PAA07590 for <avt@ietf.org>; Mon, 4 Oct 2004 15:37:08 -0400 (EDT)
Received: from pr-66-150-46-254.wgate.com ([66.150.46.254] helo=mail.tvol.net) by ietf-mx.ietf.org with esmtp (Exim 4.33) id 1CEYn4-0001oX-9x for avt@ietf.org; Mon, 04 Oct 2004 15:46:38 -0400
Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.tvol.net with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id NPY70PKV; Mon, 4 Oct 2004 15:36:01 -0400
To: sassan.ahmadi@nokia.com
Subject: Re: [AVT] VMR-WB RTP Payload and Storage Formats
References: <0B08EA1BF5F6304992CDC985EE02209E02A7437B@sdebe002.americas.nokia.com>
From: Randell Jesup <rjesup@wgate.com>
Date: Mon, 04 Oct 2004 15:38:31 -0400
In-Reply-To: <0B08EA1BF5F6304992CDC985EE02209E02A7437B@sdebe002.americas.nokia.com>
Message-ID: <ybuu0ta46ko.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net>
User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 2086112c730e13d5955355df27e3074b
Cc: magnus.westerlund@ericsson.com, csp@csperkins.org, avt@ietf.org
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Randell Jesup <rjesup@wgate.com>
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Sender: avt-bounces@ietf.org
Errors-To: avt-bounces@ietf.org
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 22bbb45ef41b733eb2d03ee71ece8243

sassan.ahmadi@nokia.com writes:
>In the one hand, based on the distinctive capability of VMR-WB, there are
>people who want to use a fixed RTP clock rate of 16000 Hz to enable
>processing/injection of the 8000 Hz sampled media. Note that 8000 and
>16000 Hz sampled media have identical VMR-WB output frames. I believe
>there is technically nothing wrong and revision -04 of the I-D
>appropriately addresses this concern.

        This is a very relevant point - the decoder doesn't need to know
the input sample rate, and certainly doesn't need to know it from the timestamp.

>On the other hand, you persist on your opinion that RTP clock rate must be
>identical to the input media sampling rate regardless of the codec
>capabilities.
>
>The following excerpt from Section 4.1 of RFC 3551 (line 434)
>
>"...
>   The RTP clock rate used for generating the RTP timestamp is
>   independent of the number of channels and the encoding; it usually
>   equals the number of sampling periods per second.  For N-channel
>   encodings, each sampling period (say, 1/8,000 of a second) generates
>   N samples.  (This terminology is standard, but somewhat confusing, as
>   the total number of samples generated per second is then the sampling
>   rate times the channel count.)
>..."
>
>indicates that (there is no normative language here) the RTP clock rate
>"usually" equals the input media sampling rate and that it is independent
>of the encoding.

        As Colin has indicated, this is the traditional usage.  However, I
(personally) see no reason for this other that tradition, and Colin's
response to me didn't include any significant reason other than tradition
(I'll respond to that shortly in detail).

>Also the following excerpt from Section 6.4.4 of RFC 3550 (line 2391)
>
>"...
>   Since that timestamp is
>   independent of the clock rate for the data encoding, it is possible
>   to implement encoding- and profile-independent quality monitors.
>..."
>
>Therefore, you have no technical ground to assert that RTP clock rate MUST
>be equal to input media sampling frequency.

        In fact, in Colin's defense, I don't think Colin asserted it "MUST"
be equal to the sampling frequency, but instead he asserted (very strongly)
that it should be; that he saw no reason to break the tradition.  I (and
apparently others such as Sassan) do.

>Please think of VMR-WB as a dual-rate system where both 8000 and 16000 Hz
>sampled media are supported and that decoding can proceed without knowing
>the input media sampling frequency.

        And, if you do need to know the sampling frequency to decode, it
should either be part of the encoding or part of the SDP/etc that creates
the stream - and not the timestamp frequency portion of the SDP.


        As I see it, there are two primary uses of the RTP timestamp
and it's frequency:

1) Playback timing
   Knowing when a frame was sampled (the time the first sample of the frame
   was) is important for playback, especially if the codec can omit samples
   for some reason without otherwise indicating it.  You can infer lost
   packets from sequence numbers as well, but there isn't a guaranteed
   sequence number -> time equation for many codecs.  

   Note that the time of the first sample of the frame can be in any units
   of sufficient resolution.  It's handy if it's an integer multiple of the
   actual sampling rate (1->N times), but even that might not be important
   given a high enough rate, and for certain non-audio codecs that's
   already the norm (video).

2) Stream synchronization
   Combined with RTCP with NTP times to synchronize streams.  Again, the
   same issues are above for playback; in this case higher multiples of
   the sampling rate might have an advantage in some cases.

   For example, when trying to synchronize multiple audio streams for a mix
   for music, using a higher timestamp rate when possible may allow tighter
   synchronization and less chance of introducing phasing issues.  If
   you're synchronizing an 250Hz-samples LFE stream to video at 60fps, or
   much worse to multi-channel audio at 44KHz, you don't want the
   timestamps in the LFE stream to be 250Hz - that might allow the LFE
   channel to be as much as 2ms out-of-phase with the channels carrying the
   higher frequencies.  Now this might not matter a huge deal - but you
   certainly aren't _gaining_ anything by restricting the LFE timestamps to
   250Hz.  It may become worse when you're trying to merge multiple streams
   with 250Hz-samples audio coming from different sources; the phase issues
   may become quite apparent.

   It's like my old video character-generation example - even with a
   display that can handle at most 350 "lines" of analog video across,
   using a circa-1400 pixel source to generate the analog produces better
   output because you can control edge _positions_ more finely, even if you
   can't make a sharper edge transition or (same thing) go from one
   color/brightness to another any quicker.  It's an easy thing to forget
   about when you're used to working in the digital domain.

I thought I had a third, but I think it's covered by those.


Note that for video, current codecs generally use a 90 KHz timestamp rate.
This is definitely not the actual sample rate of most/any inputs.  That's
circa 3000 ticks per 30Hz video frame, way under even QCIF sample rates (if
you're thinking video pixels).  This 90Khz rate works for video because it
has _enough_ accuracy for the purpose.  Codec/etc writers have to assume
there isn't a perfect 1:1 correlation of timestamp to time-of-first-sample.


Given that RTP is not designed for easy switching of timestamp rates on a
single stream, using a single fixed "high enough" timestamp rate for a
codec with multiple input rates makes a ton of sense to me.  It definitely
simplifies intermediate pieces and simplifies receivers as well - given the
apparent complexity of dealing with multiple on-the-fly timestamp rates.

-- 
Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team
rjesup@wgate.com


_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt