[Cfrg] Re: Comments on SIV and draft-dharkins-siv-aes-00

  Hi David,

On Thu, October 18, 2007 1:41 pm, mcgrew wrote:
> Hi Dan,
>
> I'm sorry to take so long in getting back to you.  The new draft looks
> great
> - thanks for carrying it forward.  I have a bunch of comments, some on the
> document, and some on SIV itself.

  I'm working on an -01 version of the draft right now so your comments
are timely.

> First, a bunch of detailed comments.
>
> The abstract says that " SIV takes a key, a plaintext, and a vector of
> data".  I think the term "vector" will not be intuitive to many readers,
> so
> perhaps it would make sense to say that the vector is "an array of
> variable-length octet strings", or something like that.  I mean to suggest
> that you add text describing what is meant, rather than changing the
> terminology from what Phil and Tom wrote.

  OK

> For the key derivation application (Section 1.3.3), what would the SIV
> plaintext input be equal to?  Would it be omitted?

  The key derivation application uses S2V, the vectorized PRF component
of SIV. There is no encryption step.

> Also, I would guess that SIV-based key derivation would only be
> appropriate
> for deriving keys from a given key, and that it may not be suitable for
> use
> in deriving keys from data that is unpredictable but not uniformly random,
> as is used e.g. in Diffie-Hellman.  At least, I believe that this is
> outside
> of the scope of what is claimed in the security analysis, and it would
> make
> sense to document that (after verifying with Phil and Tom).

  Hopefully my explanation above addresses this comment. I will try to
be more explicit in 1.3.3 that it is S2V being used for key derivation.

> I think that it might be useful to help explain the vector of inputs by
> using an analogy to the POSIX "iovec" or scatter/gather functions readv
> and
> writev; these functions also allow the user to avoid data-marshalling, and
> they should be familiar to many implementers.   Of course, the way that
> readv and writev work doesn't depend on the way that data is broken into
> smaller elements, but SIV does.

  That's a good idea. readv and writev seem to be analogous. Each iovec
structure represents an AD input and the array of such structures represents
the vector of inputs. I will try to come up with some appropriate verbage.

> Does S2V mean "vector to string"?  Would "V2S" be sensible?

  "string to vector"

> Section 1.3.4 typo - "troughput".  Also, it might be useful to provide the
> detail that SIV requires two passes over the data during an encryption
> operation, and thus is less suitable for pipelined hardware
> implementations.

  Pipelined hardware implementations are definitely not the place for
something like SIV. I see the difference more in "control plane" versus
"data plane" application. SIV is more appropriate for a control plane
application which is typically something in user-space calling into a
cryptographic library to obtain encryption services. In such a situation
the application developer may not be aware of the requirements surrounding
nonce use for a cipher or may miss a subtle nuance in those requirements
and not be able to ensure the security of the application. A data plane
application of an AEAD cipher would typically be able to control the
nonce space (along the lines of something like what SP 800-38D requires).

  The fact that SIV requires 2 passes of the data while something like GCM
only requires 1 really just underscores for me the appropriateness of
the distinction above.

  I'll mention 2 passes in that section.

> Notation "X10*" - might be notationally clearer to define p(X) as a
> padding
> function, since "X10" looks like a variable name.

  OK

> I like the compatibility between SIV-CTR and typical CTR implementations.
>
> Sections 3 and 6 define how to use SIV as a nonce-based AEAD, and how to
> use
> it as such in the context of [AEAD].  But I think that a bit more
> specificity is needed here.  Section 3 seems to allow multiple "associated
> data" inputs, while Section 6 will need to require that there is just a
> single AD input.  So I think that Section 3 needs to add a definition
> that's
> specific to the use of SIV together with [AEAD].

  It's a shame that [AEAD] requires a single AD input. I know Phil has
commented that it should allow multiple inputs. Your response was that
it is too late. Is it? [AEAD] is still an I-D.

  [AEAD] is supposed to provide a generic interface into AEAD cipher modes.
It doesn't as long as it constrains valid modes.

  A single AD input is the degenerate case. SIV can handle a single AD
input just fine but the generic interface to AEAD cipher modes should
not force it.

  I don't think the changes to [AEAD] are significant to remove the
limitation on AD inputs and I'd be happy to suggest text on how to do
that if you're willing to produce another rev of the draft.

> Next, some higher-level comments.
>
> First, what's the motivation for key wrapping?   This is an important
> question that a lot of people have wrestled with.  I understand from
> Section
> 1.3.1 that nonceless AEAD is valuable because there are existing protocols
> that do not make use of nonces, so SIV's capability for nonceless AEAD
> enables it to be easily adopted by these protocols.   This is a very good
> point.  Nonetheless, it does not address the question of "when should a
> user
> use nonceless AEAD?" outside of those "legacy" cases.   I would expect
> that
> we would want to provide guidance that users SHOULD use nonces wherever
> possible, but MAY otherwise do without nonces.  (Perhaps there should be
> an
> exception for cases in which determinism is essential, e.g. database
> applications in which plaintext-to-ciphertext mapping must be
> deterministic.
> But this is clearly a special case.)

  I think the motivation is to provide deterministic authenticated
encryption for specialized data, such as cryptographic keys. The American
Standards Committee Working Group X9F1 has come up with a draft standard
for such a problem. S/MIME has RFCs on that problem. I do mention that in
the draft.

  I'm a little reluctant to jump into that brier patch though. [DAE] does
a very nice treatment of the reasons behind, and requirements for, key
wrapping both informally (in the introduction) and formally (in appendix
C). I will try to address your comment by pointing readers to [DAE] and
X9F1. Would that be acceptable?

> Second, I'm skeptical about the value of the vector input, so I suggest
> that
> more motivation, explanation, and an example usage or two, be added to the
> draft.  I'll summarize my skepticism below in the hope that it will be
> helpful.

  OK, I'll come up with an example on using a vector of inputs.

> As I understand it, the two benefits of the vector-input are that it
> eliminates the need for the user to marshal multiple inputs into a single
> input, and that it offers performance advantages in those cases that there
> are repeated invocations of the crypto function in which some of the
> inputs
> remain constant.
>
> Regarding performance, any AEAD algorithm can be made to support a
> scatter/gather or init/update/final interface as per RFC1321.  It is a
> conventional technique to copy the intermediate state after an update
> operation, and then use it to process different suffixes.   Beyond that,
> there are functions that support an "incremental" interface, in the sense
> of
> "Incremental Cryptography and Application to Virus Protection" (27th ACM
> Symposium on the Theory of Computing, May 1995).  GMAC, and many other
> functions that make use of universal hashing, can be used in this way.  So
> it is possible to reap the performance benefit claimed for SIV with some
> existing functions, and it's possible to realize the performance advantage
> without using a vector of inputs.

  You seem to be arguing against the novelty of this idea but not against
the idea itself.

  It is a natural way to deal with AD. In [AEAD] you say,

    "When using an AEAD to secure a network protocol, for example,
     this input could include addresses, ports, sequence numbers,
     protocol version numbers, and other fields that indicate how the
     plaintext or ciphertext should be handled, forwarded, or processed."

That's, potentially, several distinct pieces of information. Some may
be contiguous (addresses, ports and sequence numbers might all be in a
single header) but other might not be. Some AD might not even transit
with the authenticated and encrypted data.

  I guess I can try to highlight this concept but it seems that what
should really be explained is why these multiple distinct pieces of
information have to be viewed as a single component input to an AEAD
cipher mode.

>                                     As a concrete example, one could
> replace
> the use of AES-CMAC on a vector of inputs in SIV with a polynomial hash
> function (such as GHASH, the component of GCM/GMAC) applied to a single
> input.  This would allow even *more* performance optimizations (in
> particular, it allows optimizations whenever there are repeated
> invocations
> of the crypto function in which *any part* of the input remains constant).

  This is the second time I have heard this fantastic idea.

> In practice, it seems that these optimizations aren't used so much.  I
> believe that the reason is because the additional complexity doesn't seem
> warranted when the amount of data that stays the same across invocations
> is
> small compared to the entire data.  FWIW, I do think that there are
> applications for incremental message authentication within the area of
> security for data-at-rest.
>
> The key derivation example that's used to motivate the vector-of-inputs
> points out that in key derivation applications, it is common to have
> multiple inputs to the KDF, some of which stay fixed across multiple
> invocations of the KDF algorithm.  This is true, though I question the
> performance gains, because I suspect that in the KDF case, there are many
> small inputs.

  I attempted to get an S2V-based KDF adopted by the IEEE 802.11r (Fast
Handoff) Task Group. Performance of the S2V KDF (using AES-CMAC) was
four times faster than the HMAC-SHA256 KDF that 11r uses. This was a
real-world example in which the context and label being bound into the
derived key was constant and all that changed was the MAC address of the
AP to whom the new key was to be delivered.

>               I suspect that the vector approach would perform worse than
> the standard approach if there were ten one-byte inputs, for example.
> Additionally, I wonder why we're that interested in designing SIV to work
> well within some existing KDF models (why not just design a dedicated
> KDF?),
> and it seems to me that SIV doesn't actually fit all of the models.  It
> seems to not be usable within draft-dang-nistkdf, because SIV requires a
> key
> as a separate and distinct input, which that draft does not provide.
> Also,
> as noted above, SIV-based KDFs may not be suitable for use in deriving
> keys
> from Diffie-Hellman (or least this use would be outside the "warranty"
> provided by the security analysis).

  As I mentioned above it is the S2V component of SIV that is being
promoted as a KDF. The KDF used by IEEE 802.11r is modeled along the
lines of a draft-dang-nistkdf KDF and S2V easily replaced it with, as
I noted, a 4x performance increase.

> Regarding the need to avoid data marshalling, it is true that in many AEAD
> cases, there are multiple data fields that are authenticated.  However, in
> most of my experience (e.g. RFCs 4106 and 4543 and IEEE P1619) there is no
> data-marshalling issue that needs to be solved, because the fields are
> already contiguous, and the lengths of the fields are fixed, so there is
> no
> length-encoding overhead.  In ESP (RFC 4303, not RFC 2406) and SRTP, there
> are AD inputs that are not contiguous, but which have fixed lengths; this
> is
> because both of these protocols have an authenticated but unencrypted
> header, plus an authenticated "extended sequence number" that holds the
> high-order bits of the sequence number, which is not carried in the
> packet.
> For these protocols, an AEAD implementation that supports an
> init/update/final interface is sufficient.   Also, in practice these
> protocols are often implemented by copying the extended sequence number
> field.   Perhaps there are other application areas in which the
> data-marshalling before AEAD is onerous - maybe someone can chime in.
>
> On reflection, it seems to me that the enduring advantage for using a
> vector
> of inputs is the fact that it frees the user from the need to explicitly
> encode the lengths of the component fields, when those fields have
> variable
> length.
>
> Of course, applications that make use of the vector-of-inputs feature will
> need to define how that vector is constructed, and make sure that the
> order
> of the inputs is well understood. In addition, if the applications want
> to
> make use of a conventional AEAD method as well, then they will also need
> to
> define a data-marshalling function.

  Yes, this requirement is not unique to an AEAD mode that takes a vector
of inputs. Even an init/update/finish mode would need to define how the
components of the AD are used.

> Thirdly, I think it would be useful and interesting to have a comparison
> between arbitrary-length pseudorandom permutations with associated data
> (such as EME and XCB) and SIV.   But I think that's probably getting
> beyond
> the scope of the SIV draft ;-)

  A good topic for the list :-)

  thanks again for your comments and I will incorporate them into my -01
version which should be coming out shortly.

  regards,

  Dan.

_______________________________________________
Cfrg mailing list
Cfrg@ietf.org
https://www1.ietf.org/mailman/listinfo/cfrg