Re: [codec] #24: Negotiation of codec parameters?

"codec issue tracker" <> Sun, 02 May 2010 08:32 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 42D373A68C7 for <>; Sun, 2 May 2010 01:32:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -101.183
X-Spam-Status: No, score=-101.183 tagged_above=-999 required=5 tests=[AWL=-1.183, BAYES_50=0.001, NO_RELAYS=-0.001, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id sHzk4fNd+FDa for <>; Sun, 2 May 2010 01:32:28 -0700 (PDT)
Received: from (unknown [IPv6:2001:1890:1112:1::2a]) by (Postfix) with ESMTP id AAE913A67B3 for <>; Sun, 2 May 2010 01:32:28 -0700 (PDT)
Received: from localhost ([::1] by with esmtp (Exim 4.69) (envelope-from <>) id 1O8Ub0-0003Z1-Fz; Sun, 02 May 2010 01:32:14 -0700
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: codec issue tracker <>
X-Trac-Version: 0.11.6
Precedence: bulk
Auto-Submitted: auto-generated
X-Mailer: Trac 0.11.6, by Edgewall Software
X-Trac-Project: codec
Date: Sun, 02 May 2010 08:32:14 -0000
Message-ID: <>
References: <>
X-Trac-Ticket-ID: 24
In-Reply-To: <>
X-SA-Exim-Connect-IP: ::1
X-SA-Exim-Scanned: No (on; SAEximRunCond expanded to false
Subject: Re: [codec] #24: Negotiation of codec parameters?
X-Mailman-Version: 2.1.9
List-Id: Codec WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 02 May 2010 08:32:30 -0000

#24: Negotiation of codec parameters?
 Reporter:  hoene@…                 |       Owner:     
     Type:  enhancement             |      Status:  new
 Priority:  minor                   |   Milestone:     
Component:  requirements            |     Version:     
 Severity:  -                       |    Keywords:     

Comment(by hoene@…):

 [kpfleming@]: If our goal is to use RTP AVP/SAVP/AVPF/SAVPF profiles for
 transport (as seems likely), then differences in sample rates between
 offers must be listed separately in the SDP. Whether they have a
 different codec 'name' in the SDP or not seems less important, because
 the combination of the codec name and sample rate is required to
 uniquely identify the format in any case. Note that this is *sample rate*,
 and not bitstream rate.

 [Christian]: No, please not. Please keep the interface to the codec as
 simple as possible! In addition, one must consider the following

 1) First, the sample rate MUST be changed dynamically to cope with varying
 transmission bandwidths.

 [Stephen]: Dynamically changing sample rates on the system level adds some
 complexity for RTP, since the timestamp granularity is supposed to be the
 sample rate. BTW, dynamically changing the sample rate may be in conflict
 with the idea of low-complexity compressed-domain mixing (even if the
 conversion is done internally).

 [Kevin]: If the desire is for the codec to be able to change sample rates
 to adjust to network conditions, then I agree with Stephen... the
 'external' sample rate (input to the encoder and output from the decoder)
 should be fixed, and this is what would be negotiated in SDP and used for
 RTP timestamps. The codec can downsample in the encoder and upsample in
 the decoder if it has decided to transmit fewer bits across the network.

 [Stephen]: Something like: CODEC MAY reduce the acoustic bandwidth at
 lower bit rates in order to optimize audio quality. This is free of any
 technology assumption about how the acoustic bandwidth is reduced.  The
 MAY indicates that it is permissible.  But if the CODEC algorithm doesn't
 need to reduce the acoustic bandwidth, then we are making no statement
 that it SHOULD (or SHOULD NOT).

 Kevin is distinguishing dynamic changes to the sample rate (for bandwidth
 management) from multiple fixed sample rates; and I agree that is a key

 I have not heard any clear application requirement for more than one fixed
 sampling rate.  Though if there is such a requirement, IMHO we would have
 to negotiate the rate within SDP in the usual way, and it would affect the
 RTP timestamps, jitter buffers, etc.  G.722.1 / G.722.1C is one precedent
 - it is the same core codec, but can run at two different sample rates
 (negotiated by SDP).

 [Christian]:  It still might make sense to negotiate the maximal supported
 sampling rate via SDP or, if possible, to select one out of multiple
 sampling rates, if the audio receiver can cope with multiple rates well.
 The internal sampling frequency of the codec NEEDS NOT to be affected by
 the external sampling frequency.

 However, the decoder might want to signal to the encoder that the decoding
 is requiring too many computational resources and that a less complex
 coding mode (or a lower sampling frequency) should be taken.

 [Stephen]: This would make the signaling more complicated - personally I
 am not convinced it is worth it.  I think a better avenue is to bound
 overall complexity, and to focus on dynamically adapting to network
 conditions (as opposed to dynamic complexity management). You can't
 dynamically negotiate complexity in many scenarios anyway - for instance
 it makes no sense if you are using multicast.

 > This would make the signaling more complicated - personally I am not
 convinced it is worth it.
 It is a difficult tradeoff. However, signaling overload is done in Skype.
 Such as signaling might be very useful for mobile devices, which want to
 save power and thus lower their CPU clock. Or wireless IP based headphones
 which do not have large batteries. I am thinking of signaling the states:
 overloaded, fine, and low. That should be enough for most operational
 > I think a better avenue is to bound overall complexity, and to focus on
 dynamically adapting to network conditions (as opposed to dynamic
 complexity management).

 I just like to remind that the good old TCP does support both: congestion
 control to adapt to network conditions and flow control take into account
 an overloaded (=full) receiver.

 [Stephen]: TCP is a different case, since for this we are using RTCP to
 signal our feedback, and I don't think it has the facility you are
 This concept seems pretty theoretical to me.  If we need to manage
 complexity / quality tradeoffs, why not just use profiles (as AVC/H.264
 does) or create a low complexity variant (like G.729A).  I really don't
 see the need for dynamic complexity management.

 BTW, you seem to be assuming that a lower sample rate results in
 significantly less complexity.  The savings there might not be as great as
 you think, especially if the receiver needs to resample anyway (to prevent
 those sound card limitations you were talking about before).

 [Roman]: RTCP is almost universally not implemented. The biggest VoIP
 gateway on the market does not generate RTCP. If we will rely on any RTCP
 functionality for bandwidth control it will probably be ignored.

 [Stephen]: Videoconferencing devices do almost always support RTCP.  It is
 regrettable that so many VOIP devices do not.  Anyway, I do not think our
 charter scope includes invention of a new mechanism for signaling the
 network quality.

 [Roman]: My remark about RTCP was to try to develop a CODEC that will
 function properly with RTCP absent. If we require RTCP based mechanisms in
 order for the CODEC to operate properly, this can impede the adoption of
 this CODEC. In no way do I propose to create new signaling mechanisms.

 [Stephen]: I rather like the idea of negotiating maximum audio bandwidth.
 For me that is different from dynamic complexity management, and is being
 signaled for a different purpose (wasting coded bits on unheard
 spectrum degrades the quality of the heard spectrum).

  Why would it need to be negotiated?  For a suitably designed format, the
 encoder could choose not to waste bits on high frequencies without any
 negotiation or extra signalling.
 I do agree that having "only one mode" would be ideal, to maximize
 interoperability.  I wonder whether we can achieve high enough
 computational efficiency for this to be viable.

 [Koen]:Not all hardware supports arbitrary/high sampling rates.  PSTN
 gateways don't go above 8 kHz.  Same for some mobile devices. Without
 signaling, how would the encoder know that the farend decoder will not
 take advantage of frequencies above a certain threshold?

 [Ben]: When I say signalling, I mean signalling within the codec
 bitstream.  The encoder can change its behavior based on knowledge of the
 receiver's configuration, but the bitstream does not need any extra
 signalling to indicate the change in behavior.

 If the receiver is a PSTN gateway, then an "internal codec rate" of 8 KHz
 would presumably produce as good quality/bitrate with lower encoder and
 decoder complexity.  However, if we can make IWAC sufficiently low-
 complexity, operating at 48 KHz may be acceptable.  It will help if we can
 structure the codec so that operating at lower bandwidth is very

 [Stephen]: When I said signaling I meant SDP, not anything in the
 bitstream itself.  I was not excluding audio bandwidth changes mid-call as
 part of network adaptation.  Though as we all agree this needs to be
 carefully designed.

 I agree it is best if the decoder does not require any knowledge of the
 SDP negotiation (or any other information beyond the RTP packet stream
 itself) in order to correctly decode the audio -- which I think is what
 you were concerned about.

 It would be a nice property if reducing the acoustic bandwidth also
 allowed the MIPS to be reduced, but I do not think it is a requirement;
 I'd personally rather manage complexity with a Low Complexity profile (if
 that is really needed), since then I could keep the acoustic bandwidth
 (accepting a higher bit rate instead).

 [Christian]: Negotiating codec parameters with SDP has a long tradition.
 Take for example µLaw (RTP payload type 0): Here you negotiate the
 sampling rate. Also, the number of channels are negotiated for many
 codecs. I think sampling rate and number of channels can be done with SDP.
 However, I would avoid other codec specific parameters. Especially, in
 case of AMR the negotiation is quite complex should be avoided for the
 Internet  CODEC.

 [Stephen]: My point here was not that SDP negotiation should be avoided.
 I was tryiing to say that it is best if the RTP payload is complete (in
 the sense that it can be fully decoded even if you ignore the signaling,
 as long as you  know the codec itself).  For instance, if you negotiate
 the number of channels, it should be possible for the decoder to identify
 the number of channels from the RTP payload.

 There are codecs where this is not done.  Though personally I think it is
 the best architectural approach, even if it costs some payload bits.  One
 reason is that I think changing modes should be seamless, and there is a
 race condition between the signaling and the RTP payload. If you are
 adapting to network conditions, it is particularly useful to change on the

 [Roni]: Negotiation of codec parameters is not a tradition it  is needed
 if there are optional modes that the decoder can support in order to allow
 the sender to know if the receiver can receive the specific mode. If there
 are mandatory modes you may be able to provide the information in-band but
 this is not negotiation. Also note that while the signaling may use
 reliable channel the media path is not reliable and may suffer packet loss
 that may cause the loss of important parameters. We have such example in
 the H.264 parameter sets where they can be carried in the SDP for
 reliability on in-band as part of the payload.

 Personally I favor carrying those H.264 parameter sets on the media path,
 since there are situations (switched multipoint calls for one) where the
 timing matters.  With that use case, if reliable-but-too-late delivery
 occurs, there are decoding errors even if there is no packet loss.
 Though of course SDP transmission alone may be suitable for other
 applications, and it is perfectly legal to send them both ways.

 [Christian]: I am fine with dropping any SDP negotiation on codec
 parameters including sampling rate and channels. I like the idea of
 splitting signaling and transportation issues.
 But one question remains. We had the question on limiting the complexity
 for some kind of devices by choosing a lower sampling rate or a low number
 of channels. Shall this negotiation be done with SDP or inband?

 [Stephen]: All negotiation should be done with SDP (and should never be
 done in-band). And the RTP transport should be robust enough to permit
 seamless changes to any mode that is consistent with the negotiation (with
 no signaling).
 The first point I think is essential.  The second reflects my own view on
 how RTP packetization should be done.

 [Christian]: I am getting confused… Do you mean that the parameter about
 sampling rate MUST be negotiated with SDP and not transmitted in-band?
 Or MUST NOT be negotiated inband but only transmitted inband?
 Inband means within RTP/RTCP/RTPextentions and/or the Internet CODEC

 [Stephen]: In-band for me in this case means RTP only.  Or in some other
 contexts RTP payload only.

 I think there is
 (a) negotiation - particularly defining what optional modes the receiver
 can handle.  In the case of sample rates, number of channels, and any
 optional facilities, this is generally not changing mid-call.  SDP is the
 right place for this. (SDP SHALL be used for this)

 (b) Feedback - messages related to QOS, packet loss, etc are what I mean
 by this.  This should be done in RTCP, though given the lack of VOIP
 support perhaps there should be a SIP INFO backup path.  Feedback should
 not be done with RTP.

 (c) Control - Per RFC 3551, RTCP can be used for "loosely controlled"
 sessions, but "may be fully or partially subsumed by a separate session
 control protocol".  Given the statements on RTCP support in the VOIP
 infrastructure, we should be careful about putting unique controls in
 RTCP.  However, it should not be done in RTP.  Since most other audio
 codecs don't require this stuff, I suspect we won't either.  Though we
 will see...

 (d) In-band (RTP).  Not sure how else to say this.  Ideally RTP streams
 carrying CODEC (with no out-of-band, no RTCP, no SDP parameter knowledge)
 can be decoded.  That is, I think it should be possible to fully decode
 unencrypted CODEC bitstreams with only the RTP packets abstracted from
 Wireshark.  Even if the operating mode changes mid-flight.  RFC 5404
 (G.719) is one example of a packetization that accomplishes this, but
 there are other packetization RFCs which do not.

 [Schwarz]: You should replace "SDP" by "SDP Offer/Anser" (protocol), in
 order to emphasize the requirement for a) indication and possibly b)
 negotiation of codec configurations.
 The number of potential parameters you are mentioning could result in a
 number of various codec configurations.
 If this is the case, then the SDP O/A extensions would be recommended:
 which implies also support of

 [Instead of RFC 3264 defined SDP Offer/Answer, which may be sufficient for
 a single codec configuration, but definetely insufficient in case of
 multiple codec configurations]

Ticket URL: <>
codec <>