Re: [Cfrg] RFC Draft for KangarooTwelve XOF draft-viguier-kangarootwelve-00

David Wong <davidwong.crypto@gmail.com> Thu, 15 June 2017 17:27 UTC

MIME-Version: 1.0
In-Reply-To: <be067fcc-68e8-0a19-3cb6-2510e3f1caa5@science.ru.nl>
References: <be067fcc-68e8-0a19-3cb6-2510e3f1caa5@science.ru.nl>
From: David Wong <davidwong.crypto@gmail.com>
Date: Thu, 15 Jun 2017 18:27:17 +0100
Message-ID: <CAK3aN2rPH4PJJAqnuTE8RQNa+=nvqWbf6PNm-s9bAGe=sog=Fg@mail.gmail.com>
To: Benoît Viguier <b.viguier@science.ru.nl>
Cc: cfrg@irtf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/DpQlbgWsDOnxJnyBUnZMQdqL3mQ>
Subject: Re: [Cfrg] RFC Draft for KangarooTwelve XOF draft-viguier-kangarootwelve-00
Precedence: list

Hey Benoit,

I'm happy to see that the first draft is done!

The whole thing looks pretty good to me. I have some minor feedback as
well as a major one right below:

I think the specification mixes implementation tricks and the real
specification keccak. If you're planning on giving (reversed) values
in hexadecimal for the dsByte and the padding, I think it is worth
specifying the sponge function differently with inversed operations in
the permutation and a transformation from input to lanes, relevant to
the tricks that implementations use. This would also mean getting rid
of any LSB 0 bit-numbering mention in the convention section. I think
this RFC would be a really good opportunity to clear things up and
simplify the implementers' job.

I would start by defining F as F(input, domain_separation,
output_length). Then the following algorithm for the function F:

// Padding
1. append the domain_separation byte to the input
2. append zeros to the input until it becomes a multiple of the rate (168 bytes)
3. XOR the last byte of the padded input with 0x80

// XOR'ing
4. split the input in n blocks of 168 bytes (rate size): block_1, ..., block_n
5. For each block, apply the Absorb algorithm
      Absorb(block_1, state); ...; Absorb(block_n, state)
      (note that each Absorb call is taking an updated state from the
previous Absorb call)

// Output'ing
6. concatenate the outputs of Squeeze algorithm until the amount of
bytes collected is greater or equal
   to output_length
      output = Squeeze(state) || ... || Squeeze(state)
      (note that each Squeeze call is taking an updated state from the
previous Squeeze call)
7. truncate the output to the size of output_length and return the result


// Absorb algorithm
Absorb(input, state) takes an input of 168 bytes, the state of the
sponge and updates the state.

1. Split the input in 21 8-byte blocks: input_1, ..., input_21; such that
   input_i = {b_i_0, b_i_1, b_i_2, b_i_3, b_i_4, b_i_5, b_i_6, b_i_7}
2. For each input_i, reverse the order of the bytes, such that
   reversed_input_i = {b_i_7, b_i_6, b_i_5, b_i_4, b_i_3, b_i_2, b_i_1, b_i_0}
3. For the first 21 "public" lanes of the sponge seen as
   Lane_i = {L_i_0, L_i_1, L_i_2, L_i_3, L_i_4, L_i_5, L_i_6, L_i_7},
   XOR each reversed_input_i against each Lane_i, such that
   Lane_i = {L_i_0 + b_i_7, L_i_1 + b_i_6, L_i_2 + b_i_5, L_i_3 +
b_i_4, L_i_4 + b_i_3, L_i_5 + b_i_2, L_i_6 + b_i_1, L_i_7 + b_i_0}
4. Update the sponge state by calling the permutation with only the
last 12 rounds (see Appendix A)

// Squeeze algorithm
output = Squeeze(state) takes the state of the sponge and updates the
state, while returning a 168 bytes output

1. Copy out the 21 "public" lanes of the sponge state Lane_1, ...,
Lane_21 seen as
   Lane_i = {L_i_0, L_i_1, L_i_2, L_i_3, L_i_4, L_i_5, L_i_6, L_i_7}
2. Reverse the bytes of each copied lane:
   Reversed_Lane_i = {L_i_7, L_i_6, L_i_5, L_i_4, L_i_3, L_i_2, L_i_1, L_i_0}
3. Update the sponge state by calling the permutation with only the
last 12 rounds (see Appendix A)
4. Return Reversed_Lane_1 || ... || Reversed_Lane_21

Now, in the appendix A.1. of the permutation, I would be careful to
define every operation in the reverse direction. For example ROL64()
is done by a left shift << instead of a right shift >>. It should also
be mentionned that operations have been reversed since the lanes have
been reversed.

I think this is important because the point of such an RFC is to dumb
down the technical details and facilitate the implementation. A lot of
shortcuts can be taken here because we do not care about other
rate+capacity variants, or non-multiple of bytes.

I have some more feedback about details here and there. Feel free to
ignore or acknowledge as you deem relevant.

→ The abstract

  It could contain more information on the use-case rather than on
what type of primitive it is.
  I'm saying that because I often run into obscure RFC and have a hard
time understanding from the abstract what they are for. Here is how I
would think is important:

  * K12          → KangarooTwelve is an eXtendable Output Function (XOF)
  * trust        → The design is based on SHA-3 and done by the same inventors
  * XOF          → It provides an arbitrary output length that can be
truncated to any desired size
  * fast         → Its performs better than SHA-3, SHA-2 and SHA-1
  * big inputs   → It takes advantage of tree hashing to efficiently
parallelize hashing of big files
  * small inputs → Due to Sakura Coding and the padding, no penalty
exists for small inputs

  This could be written as:

  This document defines the KangarooTwelve eXtendable Output Function (XOF),
  a hash function based on SHA-3 and designed by the same inventors that
  provides outputs of arbitrary lengths. It performs better than SHA-3, SHA-2
  and SHA-1; taking advantage of parallelism to speed up hashing of big files
  while still being efficient by avoiding penalties for small inputs.

→ The introduction

  - It could use the same spiel about the no-penalty for small inputs.
  - You mention Keccak without defining what it is (you mention
Keccak-p before though).

→ Section 1.1. Conventions

  - Do you really need the MUST, SHOULD, etc... keywords?
    I'm not sure any of these makes too much sense in K12's context.
  - "LSB 0 bit convention" is also called "LSB 0 bit-numbering", which
might more formal and fit to the context?

> |s|  denotes the length of a byte string "s".  For example, |"FF FF"| = 2.

  I think this should emphasize the unit: "denotes the length of a
value in 'bytes'"

→ Section 2.1. Inner function: F

> Keccak-p[1600,n_r=12]

  I think it should be noted that these are the "last" 12 rounds of
the permutation with 24 rounds.

  Also I'd prefer a function F defined as F(input, domain_separation,
output_length) to simplify the
  other parts of the RFC.

> The input string S SHOULD be represented as a pair (Sbytes, dS),

  Not sure why the "SHOULD" or why the mention of a "pair". What's a
pair in C :) ?

>  multi-rate padding rule pad10*1

  - You mention the multi-rate padding without defining it.
  - Do we really need these explanations on the computation of `dS`?
Just giving the values should be enough imo

→ Section 2.2. Tree hashing over F

  - Could this section simply be renamed "Hashing"?
  - I would also enumerate the steps and simplify a few things. For example:

  KangarooTwelve( input, custom_string, output_length ) takes any
input to be hashed, a
  (optionally empty) customization string and a desired output length.
It then execute
  the following steps:

  1. Concatenate input, custom_string and right_encode(|custom_string|).
       S = input || custom_string || right_encode(|custom_string|)
  2. If the byte length of S is less than 8192 bytes, then run the
sponge function
     on S with dsByte = 0x07 and ignore the rest of these steps.
Otherwise skip this
     step and go directly to step 3.
       F(S, dsByte=0x07, output_length)
  3. Split the concatenated value into n chunks of 8192 bytes, where
the last one can
     be shorter or equal to 8192 bytes). For example with 5 chunks:
       S = S_0 || .. || S_4
       with |S_0| = |S_1| = |S_2| = |S_3| = 8192 bytes
       and |S_4| <= 8192 bytes
  4. Compute CV_1 to CV_4, optionally using the parallelism supported
by your architecture.
     (For example this is achievable on Intel architecture by using
SIMD instructions.)
       Node_i  = S_i || `110`
       CV_i    = F(Node_i, dsByte=0x0B, output_length=32)
  5. To compute the final input, start by contatenating the first
chunk with the hexadecimal string "03 00 00 00 00 00 00 00"
       final_input = S_0 || "03 00 00 00 00 00 00 00"
  6. Append the concatenation of all the the CV_i to the final input.
       final_input = final_input || CV_1 || CV_2 || CV_3 || CV_4
  7. Append the encoding of the number of chunks minus one (4 in our
example) to the final input.
       final_input = final_input || right_encoding(n-1)
  8. Append the hexadecimal string "FF FF" to the final input.
       final_input = final_input || "FF FF"
  9. Run the sponge function on the final input with dsByte = 0x06
       F(final_input, dsByte=0x06, output_length)

> For |S| > 8192 bytes

  I think diagrams should be in the appendix (or at least in a
different section). Also a diagram for `|S| <= 8192 bytes` would be
nice as well :)


                                 +--------------+
                                 |    message   |
                                 +--------------+
                                        ||
                                 +--------------+
                                 | custom_string|
                                 +--------------+
                                        ||
                     +-----------------------------------+  F(dsByte=0x07)
                     | right_encode(|custom_string|)
|-----------------> output
                     +-----------------------------------+


→ 2.3.  right_encode( x )

> x = x / 256

  I don't think you define the division here. I would define it in
this section as well as in the convention section (1.1):

     a / b outputs the quotient of the operation: a divided by b. For
example 10 = 3 * 9 + 1, so 3 would be outputed
     by the operation 10 / 9. Furthermore, since 240 = 0 * 256 + 240,
0 would be the output of 240 / 256.

→ Section 3. Test Vectors

  These should probably be in an Appendix.

→ Appendix A.3.

  You could also think of including (or replacing yours with) the
python function there:
  https://github.com/gvanas/KeccakCodePackage/blob/master/Standalone/KangarooTwelve-reference/K12.py#L98:L113
  But I think it should be noted that this is not taking advantage of
parallelism.



I probably have more, but this is starting to get huge so... I'll wait
until you tell me if I'm being unreasonable.

Thanks again for your work Benoit!

Regards,
David

[Cfrg] RFC Draft for KangarooTwelve XOF draft-vig… Benoît Viguier
Re: [Cfrg] RFC Draft for KangarooTwelve XOF draft… David Wong