Re: [codec] Format for the codec specification

Cullen Jennings <fluffy@cisco.com> Fri, 01 October 2010 17:36 UTC

Return-Path: <fluffy@cisco.com>
X-Original-To: codec@core3.amsl.com
Delivered-To: codec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9B48B3A6CC0 for <codec@core3.amsl.com>; Fri, 1 Oct 2010 10:36:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.415
X-Spam-Level:
X-Spam-Status: No, score=-110.415 tagged_above=-999 required=5 tests=[AWL=0.184, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zh6787ejHGFx for <codec@core3.amsl.com>; Fri, 1 Oct 2010 10:36:58 -0700 (PDT)
Received: from sj-iport-1.cisco.com (sj-iport-1.cisco.com [171.71.176.70]) by core3.amsl.com (Postfix) with ESMTP id 0F1453A6918 for <codec@ietf.org>; Fri, 1 Oct 2010 10:36:58 -0700 (PDT)
Authentication-Results: sj-iport-1.cisco.com; dkim=neutral (message not signed) header.i=none
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsEAGS5pUyrRN+K/2dsb2JhbACiN3GrVZw/hUQEhFGFa4MC
X-IronPort-AV: E=Sophos;i="4.57,267,1283731200"; d="scan'208";a="367274784"
Received: from sj-core-4.cisco.com ([171.68.223.138]) by sj-iport-1.cisco.com with ESMTP; 01 Oct 2010 17:37:46 +0000
Received: from [192.168.4.2] (rcdn-fluffy-8711.cisco.com [10.99.9.18]) by sj-core-4.cisco.com (8.13.8/8.14.3) with ESMTP id o91HbjAi001687; Fri, 1 Oct 2010 17:37:45 GMT
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: text/plain; charset="us-ascii"
From: Cullen Jennings <fluffy@cisco.com>
In-Reply-To: <20100925105523.15884gktkx78i41n@webmail.uniud.it>
Date: Fri, 01 Oct 2010 11:37:45 -0600
Content-Transfer-Encoding: quoted-printable
Message-Id: <8F3BC3D2-E3CB-49FE-A31C-1B4BDE88142E@cisco.com>
References: <C8C24B29.24A5A%stewe@stewe.org> <4C9D0C24.5080302@usherbrooke.ca> <4c9d216e.1021cc0a.658d.51bf@mx.google.com> <4C9D3288.1000608@usherbrooke.ca> <20100925105523.15884gktkx78i41n@webmail.uniud.it>
To: Riccardo Bernardini <riccardo.bernardini@uniud.it>
X-Mailer: Apple Mail (2.1081)
Cc: codec@ietf.org
Subject: Re: [codec] Format for the codec specification
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Oct 2010 17:36:59 -0000

These are all valid concerns, but I think people have often found that the providing code path is the "least bad way" of specifying codecs.  I did want to point out that bugs are found in specs all the time regardless if they are C code, english text, or something else. We have an errata process for updating RFCs as well as a ways to publish new versions of RFCs. I think that is about the best we can do. It's also easy to accidentally write ambiguous text like your C example below. Again, this happens in both english and C - we just need to do our best to keep an eye open for those and fix them. That's one of the advantages of having "running code" - it helps discover the corner cases that are under specified. The code tends to be "Reference" in that it is written to help illustrate the algorithm and is often not the most optimal way to implement the algorithm, may not be thread safe, may need changes to be ported to some platforms and so on.  C may not be the best for this but it has the advantage of behind widely understood - particularly in DSP codec circles. We could have a huge argument about what language for a long time. I have seen that argument before and it's not fun. In the end, the odds we would choose C are extremely high. 

Cullen


On Sep 25, 2010, at 2:55 AM, Riccardo Bernardini wrote:

> Quoting Jean-Marc Valin <jean-marc.valin@usherbrooke.ca>:
> 
>> Hi Roni,
>> 
>> Thanks for the correction on the case of G.719. Despite that, my point still stands: there *are* codecs for which the decoder is standardized without a bit-exact definition. As long as we define the amount of deviation permitted, then there is no problem.
>> 
>> Oh, and I do agree that the C code should take precedence over the mathematical description.
> 
> A question just came to my mind: what about bugs in the C code?  Suppose, for example, that the algorithm requires the computation of a DCT of a block of samples and that there is some typo in the constants used in the C code, so that what the C code computes is not the DCT, but something similar and the difference is such that it really does not matter in most cases, but causes very bad performances (when compared with theoretical description) in others.  In this case, the precedence would go to the "wrong" description.
> 
> I agree that the example above is somehow  weak (I just invented it "on the spot"), but I hope that spirit is clear: it could happen that despite of all attention we could pay, some subtle but important bugs could find their way to the C code.  Also I agree that we can always publish an errata or an RFC that obsolete the old one.
> 
> Another doubt about having a C code as a normative reference: could portability issues make the code "ambiguous"?  For example, a code could use the assumption that int is _exactly_ 32 bits long and short is _exactly_ 16 bits.  If the same code is run on a processor that uses, say, 64 bits for int and 32 for shorts, maybe the results could be different.  Of course, you can solve this _specific_ issue by including stdint.h (not available everywhere, though) and using types like int16_t, but what I want to point out is that we must pay attention to this type of portability issues, if we want the meaning of the C code being non-ambiguous.  Another example, much more subtle, could be
> 
>  int foo[3];
>  int goo, foo=3;
>  int *pt;
> 
>  pt  = (int*) foo;
>  goo = (int) pt;  /* Is really pt==foo? */
> 
> As far as I know (but I am not a C language-lawyer), the standard grants that conversion pointer-to-integer-to-pointer will give the original pointer back, but nothing is said about the case above (imagine an architecture where addresses can be only even).  Although the code above would work on a typical x86 architecture, I would not use it in a code that should play the role of a reference.
> 
> By the way, I agree that the code above is quite silly, but I seen things like this used, for example, in multi-thread applications where the programmer wanted to pass an integer by using a function that expected a pointer.
> 
> What about using some other language with less portability issues, such as Ada or Java?  Better yet, some language with a semantic formally defined? (I do not know any, but maybe someone does)
> 
> [Oh, by the way, I *do not want* to start a flame war about which language to use.  Rather than fighting over it, I would go with C.  I just wanted to share with you some doubts of mine.]
> 
> Although I agree that errors and ambiguities could creep in the mathematical description, maybe it is easier to write a correct and unambiguous mathematical description rather than a program code (in whatever language is written).
> 
> 
> 
>> Cheers,
>> 
>> 	Jean-Marc
>> 
>> On 10-09-24 06:06 PM, Roni Even wrote:
>>> 6.5 Description of the codec
>>> The description of the coding algorithm of this Recommendation is made in
>>> terms of bit-exact
>>> fixed-point mathematical operations. The ANSI-C code indicated in clause 10,
>>> which constitutes an
>>> integral part of this Recommendation, reflects this bit-exact, fixed-point
>>> descriptive approach. The
>>> mathematical descriptions of the encoder and decoder can be implemented in
>>> other fashions,
>>> possibly leading to a codec implementation which does not comply with this
>>> Recommendation.
>>> Therefore, the algorithm description of the ANSI-C code of clause 10 shall
>>> take precedence over the
>>> mathematical descriptions whenever discrepancies are found. A set of test
>>> signals, which can be
>>> used together with the ANSI-C code in order to verify bit-exactness, is
>>> available as an electronic
>>> attachment to this Recommendation.
>>> 
>>> 
>>> Roni Even
>>> 
>>> 
>>> 
>> _______________________________________________
>> codec mailing list
>> codec@ietf.org
>> https://www.ietf.org/mailman/listinfo/codec
>> 
> 
> 
> 
> -- 
> Riccardo Bernardini
> DIEGM -- University of Udine
> via delle Scienze 208
> 33100 Udine
> Tel: +39-0432-55-8271
> Fax: +39-0432-55-8251
> 
> ----------------------------------------------------------------------
> SEMEL (SErvizio di Messaging ELettronico) - CSIT -Universita' di Udine
> 
> _______________________________________________
> codec mailing list
> codec@ietf.org
> https://www.ietf.org/mailman/listinfo/codec