Re: [MLS] hardening MLS against bad randomness

Hey Richard,

> I'm inclined to say that this is not a matter for the MLS spec.

> So I definitely would rather go for the general fix.

I'm not sure I understood you. Could you clarify? Are you leaning towards
putting the general fix in the protocol RFC or rather leave it out? Or something
else?

As for not being in spec. My take is a bit different. In principle at least, I
like the idea of designing crypto/security tools to withstand bad randomness
when possible (at a reasonable cost). Two examples of modern designs which I
believe take a similar view are EdDSA (i.e. Ed25519 & Ed448) and AES-GCM-SIV
(and other nonce reuse/misuse resistant AEADs). They all include mechanisms to
reduce the impact of bad randomness. I think similar defenses could be in scope
for MLS too.

Could you elaborate why you lean towards this not being a matter for the MLS spec?

In the mean time, continuing with the EdDSA / AES-GCM-SIV analogies, the general
fix is philosophically closer to the EdDSA's approach. Functionality doesn't
depend on actually implementing the defense. (The verifier of an EdDSA signature
would still accept signatures if R where chosen differently than specified in
the spec. Similarly, a Commit msg receiver in MLS wouldn't know if the sender is
using an entropy pool.) None-the-less the EdDSA designers felt it a good idea to
design their scheme with such a defense in place.

The second solution for MLS -- "hashing in old keys" -- is more inline with
AES-GCM-SIV in that if you don't implement it as specified then you loose
functionality. Receivers wont decrypt / process your output correctly.

- Joël

On 22/04/2020 04:46, Richard Barnes wrote:
> Hi Joël,
> 
> Thanks for thinking about this, and for the reminder on the call.
> 
> I'm inclined to say that this is not a matter for the MLS spec.  It's not a
> protocol / interoperability issue, it's a consideration for how you build your
> client.
> 
> So I definitely would rather go for the general fix.  If some better platform
> guidance is needed, run it through CFRG.  Maybe describe the implications of bad
> randomness in the Security Considerations to the protocol.
> 
> --Richard
> 
> 
> 
> 
> On Fri, Apr 17, 2020 at 8:42 AM Joel Alwen <jalwen@wickr.com
> <mailto:jalwen@wickr.com>> wrote:
> 
>     hey everyone,
> 
>     Sandro Coretti and I have the following suggestions around MLS's defenses
>     against bad randomness.
> 
>     The Issue: Currently the (CGKA) protocol TreeKEM relies heavily on people
>     continuously having good randomness available.
> 
>     The problem: Good randomness may not always be available though in which case
>     things can go pretty wrong. E.g. Say, Alice does a Commit with too little
>     entropy (from the adversaries perspective). This results in all new keys on her
>     Direct Path (and the update_secret) having too little entropy because they are
>     all derived deterministically *purely* from the entropy she samples in that
>     procedure.
> 
>     To be clear, by "bad randomness" we're not just talking about low entropy
>     because of an on-device attacks by an adversary. We also mean things like very
>     deterministic boot process/environments. (E.g. on some VMs, embedded device or
>     even sometimes mobiles.) There's also buggy RNGs. (Things like the Infineon
>     keygen bug in the Estonian ID cards. Also the stuff in "Mining you Ps and Qs".)
>     The point is, low entropy can be a practical concern even when an adversary can
>     *not* access the rest of your local state.
> 
>     The fix: We thought of 2 types of fixes to reduce the dependence on continuous
>     fresh entropy.
> 
>     General Fix
>     -----------
>     Basic Idea: MLS explicitly mandates a local entropy pool.
> 
>     Whenever MLS needs random bytes (i.e. when doing KeyBundle gen, or a Commit)
>     call the OS/crypto-lib to get some (supposed) random bytes B from outside. HKDF
>     bytes B with your local entropy pool. First part of output overwrites your
>     entropy pool. Second part are the bytes you actually pass on to the calling MLS
>     function.
> 
>     This gives you 2 nice properties. First, you are accumulating any entropy B
>     *may* have into your pool. So even with consistently low (but not 0) external
>     entropy your pool fills up and eventually your good for ever more (till your
>     state leaks of course). That's true even if your external calls start going back
>     to 0 entropy (e.g. say you update to a buggy external RNG implementation or your
>     in a VM and the OS is getting (poorly) initialized from some fixed snapshot but
>     your apps local state is persistent). Second, if your pool leaks, then as soon
>     as your OS/lib gives you enough good entropy again you back to having a good
>     pool to. So you have PCS for leaked randomness. As long as either pool or
>     external entropy are good the resulting being passed to MLS have enough entropy.
> 
>     Pro: general catch-all solution. efficient, easy to analyze.
>     Con: requires implementors to handle cryptographically valuable randomness &
>     probably, to hook calls from their crypto-libs; a new and maybe touchy practical
>     requirement for implementors compared to what MLS currently asks of them.
> 
> 
>     MLS Specific Fix
>     ----------------
>     To avoid the above cons (and maybe others we didnt think of) here's an alternate
>     solution approach using only already defined functions from the MLS spec. We
>     demonstrate it on the case of a Commit as this is probably the most egregious
>     case. (If Alice uses bad randomness it doesnt just "pollute" her own leaf like
>     when she does an update proposal. It pollutes a whole chunk of the ratchet
>     tree.)
> 
>     Basic idea: Whenever sampling/deriving a new secret key, make it also depend on
>     the old secret key and on the application key schedule. That way, if either the
>     old secret or application key schedule was secure, then so will the new one be
>     *even when using bad randomness*. Mixing in the application key schedule is
>     valuable if there was no old secret e.g. when we're assigning a new key pair to
>     a previously blank node.
> 
>     For concreteness here's how that can work for the Commit operations (C.f.
>     Section 5.4 in MLS protocol version 9)
> 
>     commit_secret = HKDF-Expand(epoch_secret, "mls 1.0 welcome", Hash.length)
> 
>     ---------- snip -----------
>     path_secret[0] = HKDF(leaf_hpke_secret||commit_secret, "path",
>                                                             "", Hash.Length)
> 
>     If node n on direct path is blank
>       sk[n] := ciphersuite key length number of 0 bytes.
>     Else
>       sk[n] := old_node_priv[n]
> 
>     path_secret[n] = HKDF(path_secret[n-1]||sk[n], "path", "", Hash.Length)
> 
>     new_node_priv[n], new_node_pub[n] = Derive-Key-Pair(path_secret[n])
>     ---------- snip -----------
> 
>     Pros: only uses existing MLS functions. other parties implicitly verify that the
>     fix is being used by re-deriving same key material as commiter. no need to hook
>     RNG calls.
> 
>     Con: less general so full fix requires changing other parts of MLS where
>     security critical randomness is used. In particular, key bundle generation (or
>     at least updating in an existing session).
> 
> 
>     - Joël & Sandro
> 
>     _______________________________________________
>     MLS mailing list
>     MLS@ietf.org <mailto:MLS@ietf.org>
>     https://www.ietf.org/mailman/listinfo/mls
>