Re: [MLS] Simplifying the key schedule

Hi Joel,

Some notes inline:

On Sun, Feb 24, 2019 at 12:17 AM Joel Alwen <jalwen@wickr.com> wrote:

> Hi Richard,
>
> Sandro Corretti and I just discussed the questions you brought up in
> your email about the key schedule and we've come round to the opinion
> that of the options you present we prefer:
>
> - for the roster a merkle tree seems to be a good solution
> - for the ratchet tree an ampelmann hashing scheme should work well
>
> In our opinion linear state size (with reasonably small constants) is
> the lesser of the three evils when compared to needing extensive hashing
> for each tree update and needing larger packet sizes for the protocol
> (e.g. due to membership proofs).
>

Thanks for the clear priorities!

> Having said that I wanted to suggest an alternative. Namely, including
> the ID's of leaves as part of what gets hashed for the ratchet tree.
> This makes including the roster redundant which means we would save on
> all the merkle tree storage as well as the hashing during join/leave
> ops. The only cost I see is that we might want to include the *hash* of
> ID's as part of the state to avoid having to recompute when that party
> updates their PK. (But that still needs less storage than a full merkle
> tree on the roster so its also an improvement.)
>

This seems sensible to me.  So you would basically end up with a unified
tree with the following shape:

- Leaves hold (DHPublicKey, Credential, NodeHash)
- Intermediate nodes hold (DHPublicKey, LeftHash, RightHash, NodeHash)

It actually looks to me like this is a wash from a storage POV -- in each
case, you have 2n-1 DH keys, n credentials, and n-1 hashes.  But it
probably saves you some hashing in some cases.

> While we are on the subject of amplemann tree's I wanted to make a
> suggestion for how to deal with blank nodes. I think its conceptually
> simpler (and thus hopefully also less error prone / easier to code) then
> using the resolution of blank nodes to define their values in the
> ampelmann hash. We let the value of each node begin with an indicator
> bit b (or octet if thats more practical). So node values are strings of
> the form b || PK. (Alternatively for leaves it would be b || PK ||
> H(ID).) When b indicates the node is blank (b=1) we simply define PK
> (and H(ID)) to be all 0s.
>

This is actually already kind of covered in the current spec.  The nodes in
the ratchet tree are of type optional<DHPublicKey>, which is 0x01 || PK
when populated and 0x00 when blank.  So if we just use that same struct for
the V input to the ampelmann hashing, things should be pretty
straightforward to code.

> I'm also in favour of Raphael's suggestion to define the hash of a node
> to be H(L || R || V) where L and R are the hash of the left and right
> child while V denotes the value of the current node. It seems simpler
> than H(H(L || R) || V).
>

The only difference that occurs to me here is that there's a factor of 2 in
proof size to be had: If you don't care about the children, you can just
send H(L || R) instead of sending (L, R).  Of course, there's a
corresponding factor of two in hashing, so there's another trade-off space
to select from.

--Richard

>
> - Joel
>
>
> On 2/23/19 10:27 AM, Richard Barnes wrote:
> > At the last interim, we filed an issue to discuss simplifying the key
> > schedule.  In particular, folks were unhappy that the whole, possibly
> > gigantic GroupState object has to be hashed on every epoch change.  When
> > I sat down to think about how to simplify, though, it wasn't clear what
> > direction to go in, because it wasn't clear exactly what people were
> > finding painful.
> >
> > I'll posit from the start that I'm pretty sure that we need to somehow
> > include the tree, the roster, and the message transcript into the key
> > schedule.  The transcript inclusion follows from the same reasons as it
> > does in TLS; the roster and tree are needed because the whole transcript
> > isn't accessible to a new joiner (so the new joiner can't infer those
> > values from the transcript).
> >
> > With that said, there are multiple ways to represent those values for
> > inclusion the key schedule:
> >
> > 1. Directly (as now)
> > 2. As hashes
> > 3. As structured hashes, e.g., a Merkle or "ampelmann" tree
> >
> > People are clearly unhappy with (1).  The difference between 2 and 3
> > (and between variants of 3) is what value we're trying to achieve.  (2)
> > reduces the amount of hashing somewhat, but it doesn't change the amount
> > of state you have to keep -- each member has to keep the full roster and
> > tree, and hash the whole thing when it changes.
> >
> > For (3), it seems like there are weak and strong forms..  In the weak
> > form, we make it so that the member has to hash a sub-linear-size amount
> > of data to update its representation of the tree/roster when it changes
> > (e.g., in a Merkle tree, you have to recompute log(N) nodes)..  But each
> > member still has to have the whole structure.  In the strong form, we
> > arrange the protocol so that a member can operate while holding a
> > sub-linear amount of state (e.g., just a copath/frontier in some tree).
> > Note that the strong form would entail pretty major protocol changes,
> > e.g., bringing back the Merkle membership proofs that were in earlier
> > versions of the protocol.  For similar reasons, this would also entail
> > larger messages.
> >
> > So basically, the mapping of options to values is:
> >
> > 1. Overall simplicity
> > 2. A smaller coefficient on the linear amount of data hashed
> > 3(weak). Sub-linear amount of data hashed
> > 3(strong). Sub-linear amount of state
> >
> > In my personal value judgement, I would probably end up somewhere
> > between (2) and (3,weak).  It seems pretty unavoidable to me that
> > members will have *some* linear-size state for the group, even if it's
> > only a names for the members; adding say a key and a roster entry for
> > each of those doesn't seem like a huge increment.  So appealing though
> > the state savings would be, they're not compelling enough to merit the
> > protocol complexity and overhead.
> >
> > All that said, I would be interested in other folks' thoughts on which
> > options are to be preferred here, or if there are other options that
> > come to mind.
> >
> > Thanks,
> > --Richard
> >
> >
> > _______________________________________________
> > MLS mailing list
> > MLS@ietf.org
> > https://www.ietf.org/mailman/listinfo/mls
> >
>
> _______________________________________________
> MLS mailing list
> MLS@ietf.org
> https://www.ietf.org/mailman/listinfo/mls
>