Re: [AVT] I-D ACTION:draft-ietf-avt-rtp-speex-01.txt

Randell Jesup <rjesup@wgate.com> Wed, 20 June 2007 06:39 UTC

Return-path: <avt-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I0tqg-0001r8-CL; Wed, 20 Jun 2007 02:39:26 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I0tqf-0001qO-7d for avt@ietf.org; Wed, 20 Jun 2007 02:39:25 -0400
Received: from pr-66-150-46-254.wgate.com ([66.150.46.254] helo=exchange1.wgate.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I0tqe-0008W8-SD for avt@ietf.org; Wed, 20 Jun 2007 02:39:25 -0400
Received: from jesup.eng.wgate.com ([10.32.2.26]) by exchange1.wgate.com with Microsoft SMTPSVC(6.0.3790.1830); Wed, 20 Jun 2007 02:39:24 -0400
To: Magnus Westerlund <magnus.westerlund@ericsson.com>
Subject: Re: [AVT] I-D ACTION:draft-ietf-avt-rtp-speex-01.txt
References: <E1Hxptq-000574-Cb@stiedprstage1.ietf.org> <1DE870BB-8980-46E8-A43D-5377BD6106CD@csperkins.org> <46752BBE.5090802@db.org> <ybutzt4ldb7.fsf@jesup.eng.wgate.com> <46779614.2010404@ericsson.com>
From: Randell Jesup <rjesup@wgate.com>
Date: Wed, 20 Jun 2007 02:39:23 -0400
Message-ID: <ybuzm2vxh7o.fsf@jesup.eng.wgate.com>
User-Agent: Gnus/5.1006 (Gnus v5.10.6) Emacs/21.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-OriginalArrivalTime: 20 Jun 2007 06:39:24.0515 (UTC) FILETIME=[BDE04B30:01C7B305]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 287c806b254c6353fcb09ee0e53bbc5e
Cc: jean-marc.valin@usherbrooke.ca, Colin Perkins <csp@csperkins.org>, IETF AVT WG <avt@ietf.org>
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: Randell Jesup <rjesup@wgate.com>
List-Id: Audio/Video Transport Working Group <avt.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
Errors-To: avt-bounces@ietf.org

Magnus Westerlund <magnus.westerlund@ericsson.com> writes:

>Randell Jesup skrev:
>> "Alfred E. Heggestad" <aeh@db.org> writes:
>>>> Section 3.3: "Sampling rate values of 8000, 16000 or 32000 Hz MUST be
>>>> used.  Any other sampling rates MUST NOT be used" is confusing. Better to
>>>> say "The sampling rate MUST be either 8000 Hz, 16000 Hz, or 32000 Hz".
>>>> Section 4.1.1: "rate" needs to be listed a required parameter, since the
>>>> codec supports several sampling rates.
>>> I assume that you mean "rate=sampling rate" here. Not sure how to formulate
>>> it in the best way, but I ended up with this:
>>>
>>>    rate: The sampling rate MUST be either 8000 Hz, 16000 Hz, or 32000 Hz.
>> The overloading of sample rate onto the timestamp rate for RTP for (most)
>> audio codecs causes some real problems with certain cases.  Switching
>> (in-call) between codecs with different sample rates -
>> overloading means that playout time is tricky to calculate, and doubly so
>> if there was a lost packet at the boundary.  (It also could significantly
>> upset jitter buffers and the like, unless you do a full reset on codec
>> shift.)  It may also complicate related RFCs like 2833, since 2833 is
>> normally intermixed with regular codecs:
>> m=audio 4321 RTP/AVP 98 99 100 101
>> a=rtpmap:98 ILBC/8000
>> a=rtpmap:99  G7221/16000
>> a=rtpmap:100 G7221/32000
>> a=rtpmap:101 telephone-event/8000
>> (a=fmtp's would be needed too)
>> Ok: what's the timestamp rate?  :-)  And when can it change?
>
>For this case to have a chance to work you need three telephone-event lines
>like this:
>  a=rtpmap:101 telephone-event/8000
>  a=rtpmap:102 telephone-event/16000
>  a=rtpmap:103 telephone-event/32000
>
>So that the packets sent is sent at the same timestamp rate as used by the
>main codec.

So long as none of the events span a timestamp rate change...

I have a strong suspicion that including the above is likely to cause
compatibility problems in practice (in addition to being lengthy, plus the
related fmtp lines, and so adding to the overall SIP/UDP problem).

Perhaps the real issue here comes down to a media stream is really a form
of multiplexed channel.  Some payload "switches" are real switches, others
are just passing additional streams of related data (like 2833 DTMF data).
There are no limits on when payloads can switch, or on overlap between
different payloads.  You *could* create a stream with 10 G.711 payloads,
with all 10 arriving at all times, and each one being a different language.
This might even be useful for RTSP, IPTV, etc.

Now in practice devices won't overlap or even switch often, at least for
VoIP devices, with rare exceptions like 2833.  Switches are usually to
respond to network (or processor) impairments, quality, etc.  The probably
works moderately well today in narrowband VoIP devices.  Mixing wideband
(G.722/etc) into the mix really does confuse things if the device doesn't
pick one at call setup and stick with it (or others of the same timestamp
rate).

Either we need to provide guidance on how to handle timestamp shifts, or
we need to move away from them (tough), or we need to move away from
timestamp rate == sample rate.  (Tough, but will help us in the long run.)

>> It probably would have made more sense if (long ago), the timestamp rate
>> had been set on the media line, with a suggestion that something like the
>> smallest value that can (sufficiently) accurately encode the presentation
>> time be selected.  (Example: for 8000 and 16000, choose 16000.  For 7000
>> and 8000, you might need to choose 56000.)  In some cases, a rate that's
>> not an exact multiple of the sample rate might need to be used, to avoid
>> needing to use a Very Large timestamp rate.
>> So - What *should* an application do?  Full reset on codec rate shift,
>> which may imply a short gap in playout, but may let you ignore packet loss?
>> What about 2833/etc?  Only allow codecs with identical timestamp rates on a
>> single media line?  (Causes all sorts of problems if someone wants to use
>> (for example) G.722).
>>
>
>I agree that timestamp rate switching is problematic. And if you have the
>sampling rate switching capability integrated into the codec, then I would
>recommend that you actually use the highest sample rate as RTP timestamp
>rate and ensure that it always provide integer timestamp ticks per
>frame. But for 8, 16 and 32k that is not an issue. Then you include a
>separate parameter to negotiate which sampling rates that really are
>allowed. But if the case really is that you anyway is reseting the codec
>between sample rate switching I think using the timestamp rate would be
>more correct.

You can only use the "highest" timestamp rate if the RFC for that encoding
allows it.  As an example, the Speex draft that started this does not allow
that.

Note: by "codec rate shift" I was planning to refer to the general case of
"shift from codec with sample rate N to codec with sample rate M", whether
the two codecs are the same or not.  For example, G.711 to G.722.1.

Full reset is something to avoid if possible, since users of VoIP want to
switch codecs on the fly in mid-call based on network conditions, etc.  You
want to be able to (fairly) seamlessly have a call be G.711 up to 1999ms,
and G.722.1 starting at 2000ms, if possible, and play with little or no gap
or artifacts.  The user might have just pressed a DTMF button before this
happened, and still be holding it down.

-- 
Randell Jesup, Worldgate (developers of the Ojo videophone), ex-Amiga OS team
rjesup@wgate.com
"The fetters imposed on liberty at home have ever been forged out of the weapons
provided for defence against real, pretended, or imaginary dangers from abroad."
		- James Madison, 4th US president (1751-1836)


_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt