Re: [MLS] MLS in decentralised environments

Simon Friedberger <> Wed, 04 April 2018 16:58 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 25DCD120724 for <>; Wed, 4 Apr 2018 09:58:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.611
X-Spam-Status: No, score=-2.611 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 4R4g0S0Krz1w for <>; Wed, 4 Apr 2018 09:58:37 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 72048127978 for <>; Wed, 4 Apr 2018 09:58:37 -0700 (PDT)
Received: from [] (helo=[]) by with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from <>) id 1f3lk1-0000rn-MB for; Wed, 04 Apr 2018 18:58:34 +0200
References: <>
From: Simon Friedberger <>
Message-ID: <>
Date: Wed, 4 Apr 2018 18:58:32 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <>
Subject: Re: [MLS] MLS in decentralised environments
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Messaging Layer Security <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 04 Apr 2018 16:58:47 -0000

XMPP has long had problems with MUCs because they are used for both

"group chats": private, few users, users usually constant, consistent
timeline important (the Whatsapp model)

"chat rooms": public, many users, users dynamic, splitting would be
acceptable, transcript optional (the IRC model)

In general "group chats" have the stricter requirements and are probably
better served by a centralized server.

I think the use-case for allowing splits is pretty weak.

1. Users mostly don't want it because it doesn't fit the model of a
modern group chat where your presence in the room is independent of the
connection of your devices. In fact it is essentially a case of "lost
messages". Which is probably the main complaint about XMPP and Signal
that I hear.

2. It seems to me it also doesn't force centralization as long as
multiple servers can host chats. They would just be different chats to
the user as well.

Having said that, allowing p2p messengers would be great but maybe they
just have to elect a leader. Supporting distributed setups is a
necessity but at the level of MLS they can probably be treated as one

Best Regards,

On 03.04.2018 11:29, Matthew Hodgson wrote:
> Hi all,
> [TL;DR: MLS doesn't seem to support decentralisation.  Can we fix
> that, especially given it'll help solve other problems too?  I have a
> handwavey proposal.]
> Since IETF101 I've been trying to work out how well MLS could be
> applied in fully decentralised environments which lack any single
> controlling server, as it seems that the current proposal effectively
> rules this out due to the state sequencing requirements (thanks to Ben
> Schwartz for spelling this out to me after the BOF!).
> This feels like it could be quite a large oversight, given there are
> many real-time communication protocols and services (e.g. Matrix[1],
> Tox[2], Briar[3], Secure Scuttlebutt[4], Whisper[5], PSS[6], psyc2[7],
> XMPP FMUCs[8] (and MIX?), even NNTP and SMTP) where messages are
> replicated over a network of peers without any single controlling
> server - all of which could benefit from the well specified,
> interoperable & scalable group e2e encryption that MLS promises! 
> There's also a more ideological argument that interoperable
> communication is such a fundamental right that the IETF should support
> services which support communication without a necessary central
> logical point of control (much as the internet itself is decentralised).
> More practically, there may be some benefit to considering
> decentralisation in MLS in general: for instance, general solutions to
> races and key synchronisation within a decentralised network could
> also solve races in a centralised deployment.  It could also improve
> scalability and geo-redundancy by avoiding the need for an atomic
> ordering system for state changes.
> To give some concrete context: all conversations in Matrix are
> expressed as Merkle DAGs of messages which are replicated over the
> participating servers using eventual consistency semantics (a bit like
> a full mesh of Git repositories all constantly pushing commits to one
> another).  As a result the conversation DAG often forks: either due to
> races, partitioned networks, disconnected or offline servers etc - but
> these temporary forks are very much a desirable feature, allowing
> partially disconnected operation (e.g. letting a site continue
> communicating locally even when isolated from the wider network), and
> aiding scalability (no need for global locks or sequencing).
> Currently Matrix uses Olm[9] (a Double Ratchet Algorithm
> implementation) to maintain a full mesh of secure 1:1 channels between
> devices, and then shares a group ratchet (Megolm[10]) per sender over
> these channels.  The megolm ratchet is a simple hash ratchet which
> advances every message, and is replaced every N messages (or when
> group membership changes). The cost of replacement is O(N) with the
> size of the group, as is the cost of adding users to the group, which
> obviously makes ART & MLS’s O(log N) behaviour appealing.
> However, the eventual consistency semantics of a decentralised
> protocol introduce challenges for E2E encryption: for instance, the
> membership of a given room is not well defined, as there may be a
> partition of the room (either due to races or netsplit) which include
> devices or users that a sender is not yet aware of.  This mandates a
> way of retrospectively syncing message keys between devices after such
> a fork: deliberately prioritising UX (ensuring messages that users
> expect to be able to decrypt can be decrypted) over forward secrecy. 
> The same mechanism can be used for cross-device history sync or
> sharing history from before users join a group.
> In Matrix we solve this by letting devices share group ratchet key
> state between each other over Olm by making so-called 'keyshare
> requests'. Devices must have explicitly verified and trusted the
> identity of the requesting device before they share keys to it (and
> currently, keyshare is only supported between a given user's devices -
> in future we could also support keyshare between all the group's
> devices if the requester is verified and can provide a proof of
> permission to view the requested content).  This obviously puts a
> large onus on the device verification mechanism to ensure an attacker
> isn't able to exfiltrate keys, but our experience has been that the
> improved UX is worth the risk (plus can always be disabled if needed,
> e.g. in a single-server centralised deployment).
> So, how can we support something like this in MLS?
> There seem to be two main obstacles: 1) the requirement for strict
> sequencing of state changes, 2) the lack of keyshare semantics to
> recover from the missing key data which is inevitable in an eventually
> consistent view of a room.  I'm going to ignore the second for now as
> it can be fixed out of band (although much like attachments, it feels
> like something which MLS should make /some/ recommendation on, given
> it can be incredibly useful as a primitive)
> However, for state sequencing: Am I right in saying that a race
> between B and C joining a group can cause one client to see a DH
> binary tree with frontier (AB, C) versus (AC, B), and thus have
> inconsistent root group keys - messages encrypted during the partition
> are going to inevitably be undecryptable by the other side?
> Is there a way where MLS could allow a group to recover from a
> partition like this by (partially?) rebalancing the DH binary tree
> into a canonical form once the partition heals?  Thus if a partition
> was detected at the application layer (in Matrix's case, we'd do this
> by noting that the B-join and C-join events share the same A-join
> parent), the servers participating in the room would rebalance (AB,C)
> and (AC,B) to both be (AB,C) and then the conversation could proceed
> as normal. One might be able to avoid rebalancing the whole tree to
> minimise CPU impact (although in practice healing these races are
> fairly rare edge cases).  Obviously this assumes that one has a way to
> request keys to recover the messages lost when the groups were out of
> sync.
> If this mechanism makes sense, it seems that it could provide a way to
> eliminate the "Sequencing of State Changes" requirement entirely from
> MLS - or at least provide an alternative for folks where either
> server-side or client-side strict sequencing isn't an option, and so
> solve the general 'how to handle races' problem mentioned at the BOF
> (whilst also necessitating an interoperable solution to history/key
> sharing, similar to the earlier “Use cases for avoiding forward
> secrecy” thread[11])
> Anyway, apologies for the stream of consciousness - feedback from
> those who properly understand ART & MLS would be hugely appreciated :)
> thanks,
> Matthew
> [1]
> [2]
> [3]
> [4]
> [5]
> [6]
> [7]
> [8]
> [9]
> [10]
> [11]
> P.S. This has probably already been fixed, but the [signal] footnote
> on is
> credited to "T. and M. Marlinspike" - I think you are missing a 'Perrin'?