Re: [Cellar] Barry Leiba's Discuss on draft-ietf-cellar-ffv1-17: (with DISCUSS and COMMENT)

Dave Rice <dave@dericed.com> Fri, 30 October 2020 18:30 UTC

Return-Path: <dave@dericed.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E8F143A10C5; Fri, 30 Oct 2020 11:30:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.119
X-Spam-Level:
X-Spam-Status: No, score=-1.119 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ONQtqKPLeDJ1; Fri, 30 Oct 2020 11:30:14 -0700 (PDT)
Received: from server172-4.web-hosting.com (server172-4.web-hosting.com [68.65.122.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AEA233A10A0; Fri, 30 Oct 2020 11:30:14 -0700 (PDT)
Received: from cpe-104-162-85-222.nyc.res.rr.com ([104.162.85.222]:53381 helo=[192.168.0.177]) by server172.web-hosting.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from <dave@dericed.com>) id 1kYZA5-000haH-Ae; Fri, 30 Oct 2020 14:30:13 -0400
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Dave Rice <dave@dericed.com>
In-Reply-To: <160049032442.20595.4616862102756589735@ietfa.amsl.com>
Date: Fri, 30 Oct 2020 14:29:36 -0400
Cc: The IESG <iesg@ietf.org>, Michael Richardson <mcr+ietf@sandelman.ca>, draft-ietf-cellar-ffv1@ietf.org, cellar@ietf.org, cellar-chairs@ietf.org, pb@das-werkstatt.com
Content-Transfer-Encoding: quoted-printable
Message-Id: <92029D16-621F-4AE0-BC3F-03C08AB21ADE@dericed.com>
References: <160049032442.20595.4616862102756589735@ietfa.amsl.com>
To: Barry Leiba <barryleiba@computer.org>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
X-OutGoing-Spam-Status: No, score=0.9
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server172.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - dericed.com
X-Get-Message-Sender-Via: server172.web-hosting.com: authenticated_id: dave@dericed.com
X-Authenticated-Sender: server172.web-hosting.com: dave@dericed.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/puRH1ttZU41CpRDGODJrcSV5x_g>
Subject: Re: [Cellar] Barry Leiba's Discuss on draft-ietf-cellar-ffv1-17: (with DISCUSS and COMMENT)
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 30 Oct 2020 18:30:17 -0000

Hi all,

I’m writing this as an update to what aspects of this DISCUSS are resolved and what may remain. I added a summary at the end, but the tl;dr is that in my opinion all discuss and comment points are resolved in our git repository but Section 3.8.1.1 could use a new look in light of the discuss comments against it.

> On Sep 19, 2020, at 12:38 AM, Barry Leiba via Datatracker <noreply@ietf.org> wrote:
> 
> Barry Leiba has entered the following ballot position for
> draft-ietf-cellar-ffv1-17: Discuss
> 
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
> 
> 
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
> 
> 
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-cellar-ffv1/
> 
> 
> 
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
> 
> There are a number of things I’d like to discuss, because I find them not
> understandable.  Perhaps it’s simply because I’m not a codec expert, but, while
> I understand that this is written for readers who will actually be implementing
> he FFV1 codec, some of them will also not be “experts”.  That said, I’m sure
> some of this is just a case of “give Barry some clue and it’s fine.”
> 
> — Section 2.1 —
> You use the term “symbol” here and later, without defining it, and I don’t know
> what it is.  A byte?  A character?  A string of bits?  What length?

Symbol is defined in https://github.com/FFmpeg/FFV1/commit/baa92e69bbdb5cece9f99556267906beffcd0fd2 as:

	`Symbol`: A value stored in the bitstream, which is defined and decoded through one of the methods described in [@tablePseudoCodeSymbols].

> — Section 3.8.1.1 —
> I’m not sure how to interpret the stuff in this section.  First, I don’t know
> why there are seven “figures”, with no captions nor other explanation.  Second,
> I’m having trouble making sense out of things like this:
> 
>   S_(i + 1, C_(i)) =  zero_state_(S_(i, C_(i)))  AND
>              l_(i) =  L_(i)                      AND
>              t_(i) =  R_(i) - r_(i)              <==
>              b_(i) =  0                          <==>
>              L_(i) <  R_(i) - r_(i)
> 
> Can you explain how this is meant to be read?  Maybe it’s just me, and maybe
> after you explain it I’ll whack myself on the head and say, “Doh!”

Reading this would take correlation to Section 2.2.5 (Mathematical Functions) for definitions of <== and <==> and the form of a_(b) and a_(b,c).

To read this out-loud would be like:

The i-th value of b equaling zero implies that the i-th value of L is less than the result of i-th value of R minus the i-th value of r. The reserve implication is also true. These implications both imply that the ‘i+1,C_(i)' value of S equals the ‘S_(i, C_(i))’ of zero_state as well as the i-th value of l equaling the i-th value of L as well as the i-th value of t equaling the i-th value of R minus the i-th value or r.

> Third, you say, “S_(0, i) is the i-th initial state,” but you haven’t
> previously introduced the term “state”, and I don’t know what it means.

Section 3.8 includes an informational reference to range coding. An understanding on how range coding works could make this section more clear.

More captions were added to these figures in https://github.com/FFmpeg/FFV1/commit/32cb9ea3e697036e81722f1ceced47f74b70549f and https://github.com/FFmpeg/FFV1/commit/ae3de36f528dbcef3fe5fdcc5b63bd5b06aefca0.

> — Section 3.8.1.2 —
> 
>   "get_rac" returns a boolean, computed from the bytestream as
>   described in Section 3.8.1.1.
> 
> I see nothing in Section 3.8.1.1 that describes get_rac.

We’ve added a definition https://github.com/FFmpeg/FFV1/commit/199ce1b95eb426e807ddbfd5d8338243d8458db1 and cross-references in https://github.com/FFmpeg/FFV1/commit/ae3de36f528dbcef3fe5fdcc5b63bd5b06aefca0.

> — Section 3.8.1.3 —
> 
>   At keyframes all Range coder state variables are set to their initial
>   state.
> 
> What does “at keyframes” mean?

It was presumed that ‘at keyframe’ meant a frame where the keyframe value was set to 1. This is clarified in https://github.com/FFmpeg/FFV1/commit/4b4d6f3c180fa2c4debefcc73793fc65ddd852c0.

> — Section 3.8.1.5 —
> This is just a list of numbers with no explanation.  It needs text explaining
> what it means.

An introduction to this table is added at https://github.com/FFmpeg/FFV1/commit/69dda1b8a4f02bcce47b860df83a2f26afdca909

The introduction is:
In this mode a State Transition Table is used, indicating in which state the decoder will move to, based on the current state and the value extracted from [@figureGetRacPseudoCode].

> — Section 3.8.2.2 —
> 
>   The level is identical to the predicted one.
>   The run and the first different level are coded.
> 
> What does “level” mean?  It’s not defined anywhere.

This was rewording to use the term “sample difference” rather than level at https://github.com/FFmpeg/FFV1/commit/d6fbc97aa32699d5bafba9760e4f5093fcaa6120 and to cross-reference to the section that defines sample difference.

> — Section 4.3.1 —
> 
>   "reserved_for_future_use" has semantics that are reserved for future
>   use.
> 
> Yes, that seems rather obvious, though it’s oddly worded.  But then you say
> what to do with “this value”, and nowhere do you say what the value is.  How
> does one know?

This was clarified in https://github.com/FFmpeg/FFV1/commit/e7f6b342ae5e1c326357479bbbfb2934fe6d83f1.

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
> 
> Some editorial comments that I ask you to please consider:
> 
> General: It’s jarring to see the defined terms, such as “Sample” and “Pixel” in
> quotes every time they’re used.  More common practice is to quote them when
> they’re first defined, and then to just use the capitalization to show that
> they’re defined terms.  I find that the quotes affect the readability and make
> it somewhat more of a challenge to read.

We agreed. These excessive quotes were removed in https://github.com/FFmpeg/FFV1/commit/f0eb24b7820889fbd93b68d1e707d4380a90ded1.

> — Section 2.1 —
> 
>   "Plane": A discrete component of a static image comprised of
> 
> Total, total nit, and pet peeve: “composed of” or “comprising”, but not
> “comprised of”.

Fixed as suggested in https://github.com/FFmpeg/FFV1/commit/fe219d9b2afff5583b40ea597918b5633d1188ea.

> — Section 2.2.2 —
> You’re missing “a % b”, which is referenced in Section 2.2.6.

Nice catch. We removed this in https://github.com/FFmpeg/FFV1/commit/34ec93cdacef056c6cb53ca3120d5a021acfb911 as it is unused in the document.

> — Section 2.2.4 —
> 
>   "a > b" means a is greater than b.
>   "a >= b" means a is greater than or equal to b.
>   "a < b" means a is less than b.
>   "a <= b" means a is less than or equal b.
>   "a == b" means a is equal to b.
>   "a != b" means a is not equal to b.
> 
> I don’t think anyone will misunderstand this, so it’s a very small thing, but…
> might this be better said as, ‘ "a > b" is true when a is greater than b. ‘
> (and similarly for the rest)?

Reworded as suggested in https://github.com/FFmpeg/FFV1/commit/6ef98dc138d34966cc686f64d1f4224202e6b263.

> — Section 2.2.5 —
> 
>   "min(a,b)" means the smallest of two values a and b.
>   "max(a,b)" means the largest of two values a and b.
> 
> Nit: “smaller” and “larger”, for only two.

Fixed as suggested in https://github.com/FFmpeg/FFV1/commit/fe219d9b2afff5583b40ea597918b5633d1188ea.

> — Section 2.2.7 —
> 
>   "a...b" means any value starting from a to b, inclusive.
> 
> “starting”?  I would leave that word out.

Fixed as suggested in https://github.com/FFmpeg/FFV1/commit/fe219d9b2afff5583b40ea597918b5633d1188ea.

> — Section 2.2.9.1 —
> 
>   "remaining_bits_in_bitstream( )" means the count of remaining bits
> 
> Section 2.2.9.3 uses remaining_bits_in_bitstream( NumBytes ), as do Sections
> 4.4 and 4.5.  This should too, yes?

Yes. Fixed in https://github.com/FFmpeg/FFV1/commit/5685a9e8c35ce5ca8420b97cbb336aad8fa5ad0c and https://github.com/FFmpeg/FFV1/commit/5056a894d16c517a63aaa4925f8ca1a91803f205.

> — Section 3.2 —
> 
>       Figure 3: A depiction of how relatively positions Samples are
>                      references within this document.
> 
> I think you mean “positioned” and “referenced”, yes?

Yes. Fixed in https://github.com/FFmpeg/FFV1/commit/fe219d9b2afff5583b40ea597918b5633d1188ea.

> — Section 3.3 —
> 
>   Background: a two's complement signed 16-bit signed integer was used
> 
> I’d remove the first “signed”.

Fixed in https://github.com/FFmpeg/FFV1/commit/55c200e145ed9067459d31910d3eda5921097098.

> — Section 3.8.2 —
> 
>   The end of the bitstream of the "Frame" is filled with 0-bits until
>   that the bitstream contains a multiple of 8 bits.
> 
> Is “that” just an extra word that needs to be struck, or is something else
> wrong here?  And is this meant to be what we usually call “padding”?

Fixed in https://github.com/FFmpeg/FFV1/commit/fe219d9b2afff5583b40ea597918b5633d1188ea.

> — Section 4 —
> 
>   Within the following sub-sections, pseudo-code is used to explain the
>   structure of each FFV1 bitstream component, as described in
>   Section 2.2.1.
> 
> I read this as saying that Section 2.2.1 describes the structure of each FFV1
> bitstream component, until I went back there and looked.  I suggest reordering
> the sentence to help the reader a bit:
> 
> NEW
>   Within the following sub-sections, pseudo-code is used, as described
>   in Section 2.2.1, to explain the structure of each FFV1 bitstream
>   component.
> END

The suggested was taken and used in https://github.com/FFmpeg/FFV1/commit/ab5a5c944d518ad72a982118b3c90b99a78d5d7c.

> — Section 4.5 —
> 
>   Encoders SHOULD NOT fill these bits.
>   Decoders SHOULD ignore these bits.
> 
> Before this you describe two kinds of bits, and it’s not clear whether these
> are talking about both kinds, or just the latter.  I think changing “these” to
> “reserved” is better.

Adjusted in https://github.com/FFmpeg/FFV1/commit/12bfc977f5f868404b69f434501f011a850ee9c0.

> — Section 6 —
> 
>   A frequent security problem in image
>   and video codecs is also to not check for integer overflows, for
>   example to allocate "frame_pixel_width * frame_pixel_height" in
>   "Pixel" count computations without considering that the
>   multiplication result may have overflowed the arithmetic types range.
> 
> This is somewhat complex and awkward (arguably a comma splice), and I suggest
> splitting the sentence thus:
> 
> NEW
>   A frequent security problem in image
>   and video codecs is failure to check for integer overflows.  An
>   example is allocating "frame_pixel_width * frame_pixel_height" in
>   "Pixel" count computations without considering that the
>   multiplication result may have overflowed the arithmetic types range.
> END

Fixed in https://github.com/FFmpeg/FFV1/commit/741f67b405b2bfbe683a2a3d9dc83468bca960cf.

> I might also make the “must” in the last sentence of that paragraph be “MUST”.

Fixed in https://github.com/FFmpeg/FFV1/commit/50d9e14ad889cc833d65a905a01637ea8b526a45.

summary:

In my opinion, all discuss and comments listed in this review have been resolved with the exception of the status of comments on Section 3.8.1.1 being less clear. I’m pasting in the current state of Section 3.8.1.1 for consideration:

> 3.8.1.1.  Range Binary Values
> 
>    To encode binary digits efficiently a Range coder is used.  C_(i) is
>    the i-th Context.  B_(i) is the i-th byte of the bytestream. b_(i) is
>    the i-th Range coded binary value, S_(0, i) is the i-th initial
>    state.  The length of the bytestream encoding n binary symbols is
>    j_(n) bytes.
> 
>    r_(i) = floor( ( R_(i) * S_(i, C_(i)) ) / 2 ^ 8 )
> 
>         Figure 11: A formula of the read of a binary value in Range
>                                 Binary mode.
> 
>    S_(i + 1, C_(i)) =  zero_state_(S_(i, C_(i)))  AND
>               l_(i) =  L_(i)                      AND
>               t_(i) =  R_(i) - r_(i)              <==
>               b_(i) =  0                          <==>
>               L_(i) <  R_(i) - r_(i)
> 
>    S_(i + 1, C_(i)) =  one_state_(S_(i, C_(i)))   AND
>               l_(i) =  L_(i) - R_(i) + r_(i)      AND
>               t_(i) =  r_(i)                      <==
>               b_(i) =  1                          <==>
>               L_(i) >= R_(i) - r_(i)
> 
>                                  Figure 12
> 
>    S_(i + 1, k) = S_(i, k) <== C_(i) != k
> 
>      Figure 13: The "i+1,k"-th State is equal to the "i,k"-th State if
>          the value of "k" is unequal to the i-th value of Context.
> 
>    R_(i + 1) =  2 ^ 8 * t_(i)                     AND
>    L_(i + 1) =  2 ^ 8 * l_(i) + B_(j_(i))         AND
>    j_(i + 1) =  j_(i) + 1                         <==
>    t_(i)     <  2 ^ 8
> 
>    R_(i + 1) =  t_(i)                             AND
>    L_(i + 1) =  l_(i)                             AND
>    j_(i + 1) =  j_(i)                             <==
>    t_(i)     >= 2 ^ 8
> 
>      Figure 14: The "i+1"-th values for "Range", "Low", and the length
>      of the bytestream encoding are conditionally set depending on the
>                             "i-th" value of "t".
> 
>    R_(0) = 65280
> 
>                  Figure 15: The initial value for "Range".
> 
>    L_(0) = 2 ^ 8 * B_(0) + B_(1)
> 
>        Figure 16: The initial value for "Low" is set according to the
>                      first two bytes of the bytestream.
> 
>    j_(0) = 2
> 
>           Figure 17: The initial value for "j", the length of the
>                             bytestream encoding.
> 
>        range = 0xFF00;
>        end   = 0;
>        low   = get_bits(16);
>        if (low >= range) {
>            low = range;
>            end = 1;
>        }
> 
>        Figure 18: A pseudo-code description of the initial states in
>                              Range Binary mode.
> 
>    refill() {
>        if (range < 256) {
>            range = range * 256;
>            low   = low * 256;
>            if (!end) {
>                c.low += get_bits(8);
>                if (remaining_bits_in_bitstream( NumBytes ) == 0) {
>                    end = 1;
>                }
>            }
>        }
>    }
> 
>         Figure 19: A pseudo-code description of refilling the Range
>                          Binary Value coder buffer.
> 
>    get_rac(state) {
>        rangeoff  = (range * state) / 256;
>        range    -= rangeoff;
>        if (low < range) {
>            state = zero_state[state];
>            refill();
>            return 0;
>        } else {
>            low   -= range;
>            state  = one_state[state];
>            range  = rangeoff;
>            refill();
>            return 1;
>        }
>    }
> 
>         Figure 20: A pseudo-code description of the read of a binary
>                         value in Range Binary mode.


Best Regards,
Dave Rice