Re: [codec] Format for the codec specification

Jean-Marc Valin <jean-marc.valin@usherbrooke.ca> Wed, 29 September 2010 10:55 UTC

Return-Path: <jean-marc.valin@usherbrooke.ca>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 1CF113A6D3B for <codec@core3.amsl.com>; Wed, 29 Sep 2010 03:55:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.12
X-Spam-Level:
X-Spam-Status: No, score=-2.12 tagged_above=-999 required=5 tests=[AWL=0.479, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w2aSWr6SapNL for <codec@core3.amsl.com>; Wed, 29 Sep 2010 03:55:20 -0700 (PDT)
Received: from relais.videotron.ca (relais.videotron.ca [24.201.245.36]) by core3.amsl.com (Postfix) with ESMTP id D53293A6D2A for <codec@ietf.org>; Wed, 29 Sep 2010 03:55:19 -0700 (PDT)
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: text/plain; charset="ISO-8859-1"; format="flowed"
Received: from [192.168.1.14] ([70.81.109.112]) by VL-MR-MRZ20.ip.videotron.ca (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTP id <0L9I006J191ACLA0@VL-MR-MRZ20.ip.videotron.ca> for codec@ietf.org; Wed, 29 Sep 2010 06:55:58 -0400 (EDT)
Message-id: <4CA31B43.5050107@usherbrooke.ca>
Date: Wed, 29 Sep 2010 06:56:03 -0400
From: Jean-Marc Valin <jean-marc.valin@usherbrooke.ca>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.12) Gecko/20100915 Thunderbird/3.0.8
To: Stephan Wenger <stewe@stewe.org>
References: <C8C816CB.24BBB%stewe@stewe.org>
In-reply-to: <C8C816CB.24BBB%stewe@stewe.org>
Cc: "codec@ietf.org" <codec@ietf.org>, Stephen Botzko <stephen.botzko@gmail.com>
Subject: Re: [codec] Format for the codec specification
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Sep 2010 10:55:21 -0000

Hi Stephen,

> In summary, I continue to argue in favor of the MPEG model: (bit-exactly)
> standardize the bitstream and the decoder operation on it.

Actually, to the best of my knowledge, MPEG does *not* specify the 
decoder in a bit-exact way. IIRC the spec has a floating-point decoder 
and then says by how much the output is allowed to deviate from the 
reference decoder. The same strategy is used in EVRB (A and B). This 
makes sense. Otherwise, we can argue for years over how exactly to 
define a multiplication (e.g. whether to round and how) because there's 
about a dozen different ways you can define some of these basic operations.

	Jean-Marc

On 10-09-29 01:03 AM, Stephan Wenger wrote:
> Hi all, especially Jean-Marc and Stephen,
>
> I think you are both thinking too technical on this question.  My
> understanding is that the speech codecs from the ITU and from 3GPP and 3GPP2
> are/have been mostly telco driven developments.  In many legislations,
> Telcos have to offer their customers a guaranteed quality level.  It is
> easier for them to ensure that level to be achieved by disallowing vendor
> differentiation in quality.
>
> Personally, I believe that a codec targeted for a best effort network is
> best designed without a strict encoder specification.
>
> That said, I continue to argue that we will be better off with a bit-exact
> decoder design.  Again, the reason is not technical, but (patent-) licensing
> related.  While it is comparatively easy, in a bit-exact decoder design, to
> determine whether a given patent claim is essential to practice the
> standard, this is not the case for a design that is based on bitstream
> syntax, aspects of the decoder operation, and minimum performance
> requirements (which, I believe, is roughly where some people are heading).
>
> Note that it is even more important for a royalty-free codec development to
> have a clear understanding about which claims are essential, than for a RAND
> codec.  The reason is simple: in the RF case, the development has to stay
> clear of any and all claims that may not be available under "exotic" (in
> this industry) RF terms, whereas for a RAND codec, people only have to worry
> about the inclusion of claims that may not be available for RAND licensing
> at all (which are very few in this industry).  As a result, a non bit-exact
> decoder design has to be much more conservative in exercising technology (as
> more claims may be swept in by advanced designs) than a bit exact decoder
> design, neutralizing, IMO, most of the positive effects a non bit-exact
> design may have from a performance viewpoint.
>
> In summary, I continue to argue in favor of the MPEG model: (bit-exactly)
> standardize the bitstream and the decoder operation on it.
>
> A test-model level encoder design document is desirable as well, as could be
> minimum performance specs for the encoder, but, IMO, neither need to be
> normative.
>
> Regards,
> Stephan
>
>
>
>
>
> On 9.28.2010 19:46 , "Jean-Marc Valin"<jean-marc.valin@usherbrooke.ca>
> wrote:
>
>> On 10-09-28 10:28 PM, Stephen Botzko wrote:
>>> Though I've never heard clear reasons, it seems to me that the
>>> "decoder-only" folks are mostly focused on applications with a massive
>>> encoder/decoder imbalance, while the "encoder and decoder" folks tend to
>>> be focused on applications with equal numbers of encoder and decoders.
>>
>> I think the main logic here is that when your codec leaves very few
>> degrees of freedom to the encoder (the extreme case being an ADPCM
>> codec, but CELP generally falls into that as well), then it's fine to
>> have the encoder being normative. Voice codecs have typically fallen
>> into that category because they were smaller than music codecs. Where I
>> think it makes sense to *not* have a normative encoder is when the
>> encoder has a lot of freedom. For example, an MP3 encoder has the
>> freedom to define it's own psycho acoustic model, decide when to use
>> short windows, and so on. And we have seen just how much MP3 encoders
>> have improved over the years. This would have been impossible if the
>> encoder had been specified normatively.
>>
>> I would say that the current codec we have here is closer to MP3 than it
>> is to (e.g.) G.729. There is a *lot* of freedom in the encoder. Not only
>> in how to switch between its three main modes (SILK, CELT, hybrid), but
>> within each of these modes. Because of that, I believe that encoders
>> will continue to evolve and get better over time, just like they did for
>> MP3.
>>
>>> So I am thinking that making the encoder normative makes sense, given
>>> that this application is centered on VOIP, so we want to ensure
>>> consistent quality in all endpoints.
>>
>> I don't really see a problem with quality. Most implementors will likely
>> end up using either the reference encoder, or improvement that got made
>> over time. I don't think anyone will complain with having better quality
>> than the standard specified.
>>
>> Cheers,
>>
>> Jean-Marc
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>
>
>
>