Re: [codec] Comment on Opus Draft 07: Section 4 Range decoder

"Timothy B. Terriberry" <tterribe@xiph.org> Thu, 11 August 2011 18:19 UTC

Return-Path: <prvs=197eb40b6=tterribe@xiph.org>
X-Original-To: codec@ietfa.amsl.com
Delivered-To: codec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1420121F8B5F for <codec@ietfa.amsl.com>; Thu, 11 Aug 2011 11:19:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.307
X-Spam-Level:
X-Spam-Status: No, score=-1.307 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, MISSING_HEADERS=1.292]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0wmvE+Ra0KpJ for <codec@ietfa.amsl.com>; Thu, 11 Aug 2011 11:19:16 -0700 (PDT)
Received: from mxip0i.isis.unc.edu (mxip0i.isis.unc.edu [152.2.0.72]) by ietfa.amsl.com (Postfix) with ESMTP id A055D21F8A57 for <codec@ietf.org>; Thu, 11 Aug 2011 11:19:15 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Ap4EAFscRE6sGgRS/2dsb2JhbABBpnSBXIFAAQEEAScRQQULCyElDwJGEwEHAodrvSyGRwSHX4svhHwPjAQ
X-IronPort-AV: E=Sophos;i="4.67,357,1309752000"; d="scan'208";a="213155565"
Received: from mr1a.isis.unc.edu (HELO smtp.unc.edu) ([172.26.4.82]) by mxip0o.isis.unc.edu with ESMTP; 11 Aug 2011 14:19:36 -0400
X-UNC-Auth-As: tterribe
X-UNC-Auth-IP: 69.77.151.28
Received: from [172.31.0.5] (69-77-151-28.skybest.com [69.77.151.28]) (authenticated bits=0) by smtp.unc.edu (8.14.4/8.14.3) with ESMTP id p7BIJWZr008506 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <codec@ietf.org>; Thu, 11 Aug 2011 14:19:35 -0400 (EDT)
Message-ID: <4E441D2F.10802@xiph.org>
Date: Thu, 11 Aug 2011 11:19:27 -0700
From: "Timothy B. Terriberry" <tterribe@xiph.org>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101120 Gentoo/2.0.10 SeaMonkey/2.0.10
MIME-Version: 1.0
CC: codec@ietf.org
References: <008501cc4c6f$e1f7b550$a5e71ff0$@uni-tuebingen.de>
In-Reply-To: <008501cc4c6f$e1f7b550$a5e71ff0$@uni-tuebingen.de>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [codec] Comment on Opus Draft 07: Section 4 Range decoder
X-BeenThere: codec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Codec WG <codec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/codec>, <mailto:codec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/codec>
List-Post: <mailto:codec@ietf.org>
List-Help: <mailto:codec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/codec>, <mailto:codec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Aug 2011 18:19:17 -0000

> 1) You might want to move Figure 8 to this section or - even better - add a
> small new figure showing range encoded bits and raw bits at the beginning
> and the ending of a frame.

How about the following:

  0               1               2               3
  7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Range coder data (packed MSb to LSb) ->                       :
+                                                               +
:                                                               :
+     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:     | <- Boundary occurs at an arbitrary bit position         :
+-+-+-+                                                         +
:                          <- Raw bits data (packed LSb to MSb) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


> 2) The sentence  "Let f[i] be the frequency of the _i_th symbol in a context
> with _n_ symbols total." is not complete. Instead I would write "Let f[i] be
> the frequency of the _i_th symbol. There are _n_ symbols in total and index
> _i_ ranges from 0 to n-1."

I modified this suggestion slightly:

"Suppose there is a context with n symbols, identified with an index 
that ranges from 0 to n-1. Let f[i] be the frequency of symbol i."

I don't want to say "index _i_ ranges from 0 to n-1", specifically, 
because there is also an index k, with the same range, and it seems odd 
to call out one and not the other. I also re-ordered the sentences 
slightly so I could introduce k in conjunction with your suggestion 4).

> 3) I am not sure about the rules of using "_something_". Maybe, it would
> make sense to use the notion for _variable_, only, if at all.

I was using <spanx style="emph"> in the XML as is normal for newly 
defined terms, and as a replacement for math mode for symbols. It looks 
as expected in the HTML output. However, I agree the rendering in the 
.txt output looks particularly poor. My plan is to change to using 
double-quotes for such terms. I'm not sure what to do about variables 
(it is sometimes useful to distinguish them, especially when their name 
is a normal English word), but agree that the _emph_ notation looks poor.

> 4) Also, fl should be renamed to fl[k] and fh to fh[k] to make things more
> clear. This also implies a couple of changes in the following sections.

I dropped the k index to more closely match the usage of the variables 
in the code, as scalars independent of the index, rather than the 
contents of an array. I'm happy to add it back, though, if you think it 
makes things more clear.

> 5) I would change "It then immediately normalized ..." into "The remaning
> bit is used in Renormalization (Section 4.1.1.1) function, which is
> immediately calld to reads further bits from the input so that rng become
> than rng>  2**23.".  Otherwise, one cannot understand (if reading only
> serial) the equations in 4.1.1.

How about:

The remaining bit is saved for use in the renormalization procedure 
described in Section 4.1.1.1, which the decoder invokes immediately 
after initialization to read additional bits and establish the invariant 
that rng > 2**23.

> Section 4.1.1
> ==========
> 1) In this paragraph, "The decoder ... identified symbol" one can introduce
> the index k to denote the index of the identified symbol.

Agreed.

> 2) I have a question in respect of the "rng" variable for k=0: Why do you
> use (ft-fh) instead of fh in the equation "rng = rng - (rng/ft)*(ft-fh)".
> Because of the reason given in the following paragraph? "With this
> formulation, all the truncation..." Adding a sentence here would make things
> more clearer.

rng = rng - (rng/ft)*fh would cause rng to be much too large. Perhaps 
you mean, "instead of rng = (rng/ft)*fh", which would be the same as the 
k>0 case, assuming fl=0. The problem with that approach is that it can 
introduce a span of the code space that isn't used, whose size is 
(rng%ft). The "normal" (i.e., Moffat, Neal, Witten) approach to handling 
that error is to special case the _last_ symbol, to include this extra 
piece of the code space. As written, this special case folds the 
truncation error, (rng%ft) == rng - (rng/ft)*ft, into the first symbol 
instead, for the reasons stated in the following paragraph, as you surmised.

How about

"Using a special case for the first symbol, rather than the last symbol, 
as is commonly done in other arithmetic coders, ensures that all the 
truncation error from the finite precision arithmetic accumulates in 
symbol 0."

along with

"It also makes some of the special-case decoding routines in Section 
4.1.2 particularly simple."


> 3) Also, you might move "After the updates... identified symbols" to the end
> of this section 4.1.1.

If you think that scans cleaner, sounds fine.

> 4) Page 19: "log"? Do you mean logarithm dualis?

Yes... I'd actually prefer an even simpler expression, as I don't think 
the spec is the place to rehash Shannon theory.

Perhaps:
"This makes the cost of coding a 0 slightly smaller, on average, than 
its estimated probability indicates and makes the cost of coding any 
other symbol slightly larger."

> Section 4.1.1.1
> ============
> 1) " Then it reads the next 8 bits of
>     input into sym, using the remaining bit from the previous input octet
>     as the high bit of sym, and the top 7 bits of the next octet as the
>     remaining bits of sym."
>
> This sentence is unclear. Which bits are read where? I would suggest to
> change it to
>
> " Then it reads in to total 8 bits from the input stream into the variable
> sym. Bit 2**7 is taken from the previous input octet has a remaining bit
> (refer e.g. to the initialization of the
>     as the high bit of sym, and the top 7 bits of the next octet as the
>     remaining bits of sym."

How about:

"Then it reads the next octet of the payload and combines it with the 
left-over bit buffered from the previous octet to form the 8-bit value 
sym. It takes the left-over bit as the high bit (bit 7) of sym, and the 
top 7 bits of the octet it just read as the other 7 bits of sym. The 
remaining bit in the octet just read is buffered for use in the next 
iteration.

> 2) I know, it does not make a difference in C. However, not everybody can
> remember the priority of operators all the time. Thus, I would suggest to
> change it to
> "((val<<8)+(255-sym))&0x7FFFFFFF." (Page 19)

That's fine.

> 3) Before "It is normal and expected that the range decoder will read
> several
>     bytes into the raw bits data (if any) at the end of the packet by the
>     time the frame is completely decoded, as illustrated in Figure 8."
> A section header is missing like "4.1.2 Reading the last bits". I also would

I'm not sure a separate section is necessary. It is simply a 
clarification of the renormalization process. If it was in a separate 
section, it would be a sub-section of Renormalization, which would make 
it 4.1.1.1.1. That may be slightly ridiculous.

> explain it better. For example
> "The Renormalization is reading continually bits from the frame until the
> last octet of the frame has been processed. After that, the range decoder
> still might need further bits to continue the decoding of the last symbols".
> These bits MUST be set to zero."

I think I actually like the current text better, as it makes explicitly 
clear that the number of bytes fed into the decoder is limited by the 
size of the frame and not the packet. I see that as an easy mistake to 
make, and doing so would cause repacketization to potentially alter the 
decoded output, which is surely undesirable.

> (Technical question: Are you really using these zero bits for compression?
> So to say, are you truncating the frame if the last symbols have an index
> k=0? Otherwise, there is no need for to demand zero with MUST).

The encoder is not normative, so I can't say what it does. For CELT or 
hybrid mode packets, truncating the frame likely won't work because it 
will change the computed bit allocation inside the frame (not to mention 
it would shift any raw bits). However, for SILK it would be perfectly 
permissible to truncate the last few octets if they happened to be zero.

In any case, I have a very strong opinion that decoder behavior should 
be specified for this case. It is _not_ clearly specified in VP8, and 
the current encoder actually stuffs an extra 32 (!) bits into every 
frame as an extremely simple (if inefficient) way of terminating the 
encoder. As a result, people are starting to rely on that particular 
encoder's behavior for error concealment (e.g., detecting corrupted 
packets), which I think is fairly terrible.

Specifying it also helps avoid a potential buffer overflow exploit, as 
it's fairly easy to construct a SILK-mode frame that will cause the 
decoder to read many more octets than are actually in the payload. If 
the behavior is mandatory to get correct decoding in the normal case, 
then it should be much harder to get wrong.

> Cont. "If the last bit of the frame use for getting raw data (see new Figure
> in Section4.1), then the range decoder will process them for decoding its
> symbols."

I can't even parse that sentence.

> Section 4.1.2
> ==========
> I am not happy with Section 4.1.2. First, you cannot understand the text
> without looking up the source code (= decompressing it in your memory).
> Also the functions are only optimizations and are not needed to understand
> the range decoder algorithm. I would suggest to move Section 4.1.2 entirely
> into the source comment making it a description of the function
> (alternative, you may move them to an appendix).

I think it's important to include. The implementation details aren't 
particularly important, but certainly the point that many of the 
divisions can be eliminated is important for someone trying to read the 
spec to get an idea of the codec's complexity. A naive reading of 4.1.1 
would suggest that two divisions are required for every symbol decoded, 
and that is certainly not the case. It's also important to make clear 
that all of these functions are equivalent to the basic decoding process 
laid out in 4.1.1. Finally, the icdf description is valuable for someone 
browsing the code and trying to figure out why none of the cumulative 
probability tables match the descriptions in the spec. There are 
comments which mention all of these things in the code, but having them 
in one central place with a more narrative description makes it easier 
to find and piece together.

If you have recommendations for improving the descriptions, I am all ears.