[MLS] Optimizing Unmerged Leaves (or: bringing the MLS tree size back to O(n))

Karthik Bhargavan <karthikeyan.bhargavan@inria.fr> Fri, 21 May 2021 14:59 UTC

Return-Path: <karthikeyan.bhargavan@inria.fr>
X-Original-To: mls@ietfa.amsl.com
Delivered-To: mls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37E923A13A7 for <mls@ietfa.amsl.com>; Fri, 21 May 2021 07:59:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0mJhlioBfKgj for <mls@ietfa.amsl.com>; Fri, 21 May 2021 07:59:47 -0700 (PDT)
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 631863A13A6 for <mls@ietf.org>; Fri, 21 May 2021 07:59:45 -0700 (PDT)
IronPort-Data: =?us-ascii?q?A9a23=3AUO/Bt6yygUSrudKaovx6t+cAxCrEfRIJ4+MujC/?= =?us-ascii?q?XYbTApG9wgzFVyGEbCG6FMvnYNmD1fIt3bo2w9UMA6pTczYdiHQtv/xmBbVoa8?= =?us-ascii?q?JufXYzxwmTYZn7JcJWbFCqL1yivAzX5BJhcokT0+1H9aNANkVEmjfvRHuemUba?= =?us-ascii?q?dUsxMbVQMpBkJ2EsLd9ER0tYAbeiRW2thiPuqyyHtEAfNNw1cbgr435m+RCZH5?= =?us-ascii?q?5wejt+3UmsWPpintHeG/5Uc4Ql2yauZdxMUSaEMdgK2qnqq8V23wo/Z109F5tK?= =?us-ascii?q?NibPnakYOQ7PUIU6HkmJSVsBOgDAS92prj/h9baJFLx4J011lnPgooDlJnZ+5U?= =?us-ascii?q?xspP67Bie0bFRNYGjtxNLNP/pfGJ2K+uIqd1SUqdlOxm6QyVBpoYN1wFuFfRDs?= =?us-ascii?q?mGeYjADUJdTiCiv64hrWhRYFRam4LRCXwFNNO/yg9k3SAVa9jGM6bBb/H+5lew?= =?us-ascii?q?TI9nMFFFPzaaowXc1JSgN37S0UnEj8q5FgWwI9EXkXCTgA=3D?=
IronPort-HdrOrdr: =?us-ascii?q?A9a23=3Af+uAKq8C3jjPW9HlPL5uk+DjI+orL9Y04lQ7?= =?us-ascii?q?vn2ZKCYlEPBw8vrFoB11726TtN98YgBCpTn/AtjmfZqsz+8R3WB5B97LN2nbUQ?= =?us-ascii?q?OTTb2KhrGSpwEIdReOj9K1Fp0NT0G9MrDN5JRB4voSmDPIa+rICePozJyV?=
X-IronPort-AV: E=Sophos;i="5.82,319,1613430000"; d="scan'208,217";a="509403658"
Received: from 89-156-101-160.rev.numericable.fr (HELO smtpclient.apple) ([89.156.101.160]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 21 May 2021 16:59:42 +0200
From: Karthik Bhargavan <karthikeyan.bhargavan@inria.fr>
Content-Type: multipart/alternative; boundary="Apple-Mail=_015259A4-CBA8-4492-B018-6766EDC48C32"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.80.0.2.43\))
Date: Fri, 21 May 2021 16:59:41 +0200
References: <162040556024.10515.18434867074453969330@ietfa.amsl.com> <F94C2B20-DC04-4028-937E-698D598E319D@sn3rd.com> <81ED88F6-16A8-45C4-A9B9-FD6659CB68FB@sn3rd.com> <4614A3C1-8461-48F8-AD53-C1399BC5F673@sn3rd.com>
To: MLS List <mls@ietf.org>
In-Reply-To: <4614A3C1-8461-48F8-AD53-C1399BC5F673@sn3rd.com>
Message-Id: <6B8101F5-A54A-429A-82B0-5386E321F525@inria.fr>
X-Mailer: Apple Mail (2.3654.80.0.2.43)
Archived-At: <https://mailarchive.ietf.org/arch/msg/mls/tprFmab1jw8CXD55gY3q3D9D09w>
Subject: [MLS] Optimizing Unmerged Leaves (or: bringing the MLS tree size back to O(n))
X-BeenThere: mls@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Messaging Layer Security <mls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mls>, <mailto:mls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mls/>
List-Post: <mailto:mls@ietf.org>
List-Help: <mailto:mls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mls>, <mailto:mls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 May 2021 14:59:52 -0000

Hello All,

Théophile, Benjamin, and I have been writing a formal specification of MLS in F* and we’re happy to report a compleger interoperable spec that (with Franziskus’ help) shares test vectors with OpenMLS.
In the course of writing this specification, we observed several potential improvements in the way MLS does things, and this email is about the first of these. Subsequent emails will have other questions/comments.

The way we store unmerged leaves in the ratchet tree is sub-optimal since it stores a lot of redundant information.
In the worst case, a tree can have n/2 unmerged leaves  (read the end of this email to see how we obtained this number).
Consequently, the total size of the ratchet tree can be as big as 2*n*log(n) + 4*n bytes (for n participants), and is never lower than 4*n bytes.
This is unfortunate, since the unmerged nodes optimization has now bloated our tree to O(n logn) from O(n).
Consequently, the welcome message/initial tree and local storage is now O(n log n).

For more precision, in the rest of this email, we won't say that "a leaf  l is unmerged", but "a leaf l is unmerged at a node n", meaning that `n` is an ancestor of the leaf `l` and the member at the leaf `l` doesn't know the node secret at `n`.
Then, a leaf `l` is "unmerged” (in the sense of the MLS spec) iff. there exists a node `n` such that this leaf `l` is unmerged for `n`.

In the current MLS spec, we store the index of a leaf `l` at each ancestor node `n` such that `l` is unmerged for `n`.
To see why this results in quite a bit of storage redundancy, observe the following property:  If a leaf `l` is unmerged for the parent of a node `n`, then `l` is also unmerged for `n`.
Given this observation, we can optimize the unmerged leaves design to bring the tree size back to O(n).

Solution 1
—————

One possibility, close to the current MLS spec would be to not store the unmerged leaf at a node if it is already stored at the parent.
This way, every leaf is present at most once in some `unmerged_leaves` array.
In the worst case, this uses 4*n + 4*(n/2) = 6*n bytes (4*n for the `unmerged_leaves` array size, 4*(n/2) because at most n/2 leaves are stored in the `unmerged_leaves` arrays).
In the best case, this uses 4*n bytes.
The tree resolution algorithm will now have to do a little more calculation to re-compute the unmerged leaves at each node.

Solution 2
—————

An alternative, which we prefer, is to get rid of the unmerged leaves array entirely and replace them by epoch numbers:
 - for each leaf, store the epoch of its creation in the tree (when the leaf was added),
 - for each parent node, store the epoch at which it was last updated.
Then, a leaf `l` is unmerged for node `n` iff. n.last_update_epoch < l.creation_epoch.
Finding the list of unmerged leaves at a node then becomes part of tree resolution, and a member can pre-compute and store this array if it likes.

In this design, we will store a 64-bit epoch number at each leaf and each parent node, thus it uses 8*2*n = 16*n bytes.
We believe that having an epoch number documenting the last update time of each node is in itself a useful feature for debugging and for security analysis.
In our model, it makes stating secrecy and authentication invariants easier.

Solution 2+
—————

However, if we wish to compress further, we can do so as well, using two facts:
- if a node `n` has two children `l` and `r`, then n.last_update_epoch is equal to either l.last_update_epoch or r.last_update_epoch,
- if `l` is unmerged for some node then l.creation_epoch = l.last_update_epoch (because in this case, the leaf never generated an UpdatePath) (the converse is not true, for example in the case when its sibling generates an UpdatePath).

So at each node, we can store the direction from which the node was last updated (for example with a boolean flag), then given any node `n` we can "follow the path" down the tree to obtain a leaf `l`, and we have n.last_update_epoch = l.last_update_epoch.
Furthermore, we don't need to store the creation epoch for leaves:
- if `l` is unmerged for `n`, then n.last_update_epoch < l.creation_epoch = l.last_update_epoch,
- if `l` is not unmerged for `n`, then n.last_update_epoch >= l.last_update_epoch (because when `l` is updated, it updates all the nodes above it).
This uses 8*n + 1*n = 9*n bytes for the whole tree.

Comments are welcome, and do please correct us if we misunderstood something.
In our opinion, it would be great to simplify the tree data structure and compress its size back to O(n) at the same time.

Best regards,
Théophile, Benjamin, and Karthik

Appendix : How to create a tree with n/2 unmerged leaves?

First, add n participants to the tree.
Then, remove the participants with odd leaf index.
Then, generate an update path from every participant with even leaf index (this ensures that there are no blank nodes)
Then, add new participants in leaves with odd leaf index, without regenerating the group secret.
Every leaf with odd leaf index will be present in the `unmerged_leaves` array on all the nodes above it, so at each level of the tree there are n/2 unmerged leaves stored, so the whole tree will store 4*(n/2)*log(n) = 2*n*log(n) bytes of information.

We can’t have a tree with more unmerged leaves, because a node at level 1 can't have both its children is its `unmerged_leaves` array, otherwise this node would be blank.


> On 21 May 2021, at 16:05, Sean Turner <sean@sn3rd.com> wrote:
> 
> Just another FYI!
> 
>> On May 12, 2021, at 08:26, Sean Turner <sean@sn3rd.com> wrote:
>> 
>> Just an FYI that we have a meeting scheduled for next week.
>> 
>> spt
>> 
>>> On May 7, 2021, at 13:33, Sean Turner <sean@sn3rd.com> wrote:
>>> 
>>> Meeting Scheduled!
>>> 
>>> More information can also be found here:
>>> https://datatracker.ietf.org/meeting/interim-2021-mls-01/session/mls
>>> 
>>> spt
>>> 
>>>> On May 7, 2021, at 12:39, IESG Secretary <iesg-secretary@ietf.org> wrote:
>>>> 
>>>> The Messaging Layer Security (mls) WG will hold
>>>> a virtual interim meeting on 2021-05-26 from 10:00 to 12:00 America/New_York (14:00 to 16:00 UTC).
>>>> 
>>>> Agenda:
>>>> GitHub issues and pull requests related to -protocol and -architecture.
>>>> 
>>>> Information about remote participation:
>>>> https://ietf.webex.com/ietf/j.php?MTID=m0e7f495c200143a9f02b11d4d7dfa487
>>>> 
>>>> _______________________________________________
>>>> IETF-Announce mailing list
>>>> IETF-Announce@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/ietf-announce
>>> 
>> 
> 
> _______________________________________________
> MLS mailing list
> MLS@ietf.org
> https://www.ietf.org/mailman/listinfo/mls