Re: [MLS] Unpredictable epochs?

Whether the channel is lossy or not doesn’t change what I’m saying. In an eventually consistent deployment, there has to be some mechanism for detecting and managing forks in the members' state. The question is whether the mechanism for telling which messages belong to which fork should be part of the consensus mechanism, which knows best how/why forks will occur, or part of the protocol, which doesn’t know why forks will occur.

To make my point, the PR you’ve opened uses a truncated hash of the last Commit message. This is not strong enough for some use-cases (where we might worry about adversary-generated collisions), and a waste of bandwidth in many other use-cases (where forks may never occur).

> On Jan 7, 2020, at 3:34 PM, Richard Barnes <rlb@ipv.sx> wrote:
> 
> Strictly speaking, yes, but only if everyone sees everything.  If a member doesn't see one of the two commits, they'll have no idea, and they'll try to decrypt with (potentially) the wrong keys.  Indicating the fork in the epoch identifier means that this case fails instead.
> 
> --Richard
> 
> On Sat, Jan 4, 2020 at 7:41 PM Brendan McMillion <brendan@cloudflare.com <mailto:brendan@cloudflare.com>> wrote:
> Strictly speaking, it’s already possible to detect forks. The content_type and epoch are left unencrypted, so if you have two messages with content_type=Commit and the same epoch number, that’s a fork. You can layer any sort of consensus mechanism you want on top of the protocol to handle those situations, whether it’s a central server, quorum voting, or something else.
> 
> It seems like the problem you’re trying to solve now is “How do we know which fork a message belongs to, if that message is sent after a fork happened?” And intuitively, I would think that this should also be dealt with by whatever consensus mechanism the DS uses. Since the mechanism exists, and already has to do all the work to keep track of forks and break ties.
> 
>> On Jan 3, 2020, at 8:55 AM, Richard Barnes <rlb@ipv.sx <mailto:rlb@ipv.sx>> wrote:
>> 
>> Yeah, I'm on the same page here. My objective is to remove the roadblock without trying to solve the whole problem.  As Kelvin points out, there's a lot of interesting thinking to be done about how to manage a DAG of state.
>> 
>> In chatting with Raphael just now, he proposed a nice framing -- that the tiebreaker bits could be used for fork *detection*.   Even in the case where the DS is trying to maintain order, there's a chance that things go wrong.  The tiebreaker bits would ensure that people don't try to use the wrong keys for a message in that case.
>> 
>> Here's a quick PR implementing the "extended epoch" proposal:
>> https://github..com/mlswg/mls-protocol/pull/281 <https://github.com/mlswg/mls-protocol/pull/281>
>> 
>> 
>> 
>> On Fri, Jan 3, 2020 at 10:32 AM Raphael Robert <raphael@wire.com <mailto:raphael@wire.com>> wrote:
>> I agree with Kelvin’s points in general. Allowing forks might be an interesting goal, but it raises a wealth of new questions.
>> On the other hand I think Richard’s proposal of extending the Epoch ID with a commit hash makes sense to detect potential forks.
>> 
>> I’d like to propose that we adopt that as a first step, but only in the context of fork detection for now. This doesn’t preclude us from building on top of that in the future, but we also don’t introduce something dangerous right now.
>> 
>> Raphael
>> 
>>> On 3 Jan 2020, at 00:15, Kelvin Ritland <kelvinr=40google.com@dmarc.ietf.org <mailto:kelvinr=40google.com@dmarc.ietf.org>> wrote:
>>> 
>>> Not-pseudorandom implies some assumptions, e.g., that each sender would only send one Commit on top of each epoch
>>> This isn't bad imo, it reduces complexity to some extent. I'm somewhat in favour of a non-pseudorandom tiebreaker if there's no other arguments against it.
>>> 
>>> How many bits of sequence number?
>>> Have we considered using varints? Similar to the CTLS proposal - https://tools.ietf.org/id/draft-rescorla-tls-ctls-02.html#rfc.section.3.1 <https://tools.ietf.org/id/draft-rescorla-tls-ctls-02.html#rfc.section.3.1>
>>> 
>>> 
>>> Other forking considerations, I think there's more involved than just changing the epoch identifier:
>>> 
>>> 1. We'll need merges as well, since forks can lead to inconsistent group membership:
>>> Suppose in epoch 1 a group has A, B, and C
>>> A removes C in epoch 2A
>>> B adds D in epoch 2B
>>> 
>>> We now have two group states 2A with AB and 2B with ABCD - and it's unclear which group to use to send with. We could mandate that any client MUST send a "merge" commit on it's view of history before sending any message. This commit could just be the list of epochs that are being merged (+a usual DirectPath update?), and processing it would mean replaying all proposals up to the common ancestor for those epochs on the common ancestor.
>>> 
>>> 2. Forward secrecy issues.
>>> By allowing forks, devices must now keep the previous epoch around to derive possible future forks - but this means we'll be re-using init_secret_. A possible solution could be to use the nth app secret from the ASTree instead of init_secret_ as the start for the new epoch secret. This ties nicely with using a non-pseudorandom tiebreaker.
>>> 
>>> 3. Ordering implications
>>> It's unclear what a sensible ordering strategy is for the DS if we allow forking. (simply ensuring that each commit is based on the "current" epoch isn't well defined since there can be multiple "current" epochs).
>>> 
>>> By allowing forks the DS need not enforce ordering at all imo.. This means that apps need to keep around old epochs, but this is already required per (2). A different question is how many we need to keep around - perhaps a TTL or the past N epochs after a topological sort would make sense.
>>> 
>>> Generally I think getting rid of server ordering is good idea:
>>> - Helps Matrix and general federation use cases
>>> - A common consistent mutated state for group operations can cause server side DB contention in my experience
>>> 
>>> Forks is one way to remove ordering, and if we need to support it anyways it's a nice benefit.
>>> 
>>> On Thu, Jan 2, 2020 at 12:52 PM Richard Barnes <rlb@ipv.sx <mailto:rlb@ipv.sx>> wrote:
>>> Argh, hit send too soon.  Continuing below...
>>> 
>>> On Thu, Jan 2, 2020 at 3:38 PM Richard Barnes <rlb@ipv.sx <mailto:rlb@ipv.sx>> wrote:
>>> Hey all,
>>> 
>>> Resurrecting this thread, as it seems that we never came to resolution on this question.
>>> 
>>> I'd like to propose we scope down and solve a narrower problem, namely the problem of enabling forks in the group's history.  So we're not going to try to provide any privacy properties, just remove the impediment to forking.  This seems like a worthwhile thing to me just because it seems silly to have forking blocked just because of syntactic constraint, in addition to it likely being necessary for some decentralized use cases (e.g., Matrix).
>>> 
>>> To allow forks, we really just need some "tie breaker" bits in the epoch ID so that if a given epoch has multiple successors, they end up with different epoch IDs.  And in order to avoid synchronization, each participant needs to be able to generate those bits independently.  There are a couple of basic questions about how to construct the tie breaker:
>>> 
>>> 1. Pseudorandom or not?
>>>   - Not-pseudorandom example: tiebreaker = commitSenderID
>>>   - Pseudorandom example: tiebreaker = H(MLSPlaintext(Commit))
>>>   - Not-pseudorandom implies some assumptions, e.g., that each sender would only send one Commit on top of each epoch
>>>   - Pseudorandom risks random collisions
>>>   - Possible to do both: tiebreaker = commitSenderID || H(MLSPlaintext(Commit))
>>> 
>>> 2. How many bits of tiebreaker?
>>>   - Non-pseudorandom size will be dictated by what we include
>>>   - Pseudorandom size will be dictated by tolerable collision probability
>>>   - Probability of collision ~ (number of forks per seq no) / 2^{number of bits}
>>> 
>>> 3. How many bits of sequence number?
>>> - DS might want to see sequence to help enforce ordering
>>> - Probably want this big enough to avoid wrapping.
>>> 
>>> (Note that this is a value that goes in every message, so there's some incentive to keep things small.)
>>> 
>>> Personally, my proposal would be something like:
>>> 
>>> struct {
>>>   uint64 sequence_number;
>>>   uint64 commit_hash; // H(MLSPlaintext(Commit))
>>> } EpochID;
>>> 
>>> That seems to strike a reasonable balance between low collision probability (~2^-64) and a reasonably small identifier.  If 128 bits is good enough for IPv6, it can be good enough for us :)
>>> 
>>> Does that seem workable to folks?
>>> 
>>> --Richard
>>> 
>>>  
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Fri, Apr 26, 2019 at 4:18 PM Benjamin Beurdouche <benjamin.beurdouche@inria..fr <mailto:benjamin.beurdouche@inria.fr>> wrote:
>>> 
>>> > On Apr 26, 2019, at 9:26 PM, Jon Callas <jon@callas.org <mailto:jon@callas.org>> wrote:
>>> > 
>>> >> On Apr 26, 2019, at 2:22 AM, Michael Rosenberg <micro@fastmail.com <mailto:micro@fastmail.com>> wrote:
>>> >> 
>>> >> So why not remove epoch entirely?
>>> > 
>>> > An epoch lets you deal with things happening neither too often nor not often enough. Presume there is a client that is either malicious or just stupid. You want to keep it from forcing a rekey every 100µs. You want to force a rekey every so often. Hence epochs. Yeah, picking the right epoch size is an exercise left to the reader.
>>> 
>>> You can’t use the epoch number for that as it is just global counter for group operations, we will have to keep track of the latest group operation “timestamp” for each member within the group state to check “update frequency” and handle some of the situations you described.
>>> 
>>> Btw in the TreeKEM formal spec I use a 64 bit unsigned integer, and I feel like having more than 2^32 group operations over the lifetime of a group is not unrealistic in certain extreme use cases, especially with large groups forcing PCS for application messages by triggering an update after each app message...
>>> 
>>> We could remove the epoch number if we really want but it is necessary to give the Delivery Service some ordering information (unpredictable or not is an interesting question) to handle concurrent handshake messages which is, I believe, the main current goal of that information.
>>> 
>>> Benjamin
>>> _______________________________________________
>>> MLS mailing list
>>> MLS@ietf.org <mailto:MLS@ietf.org>
>>> https://www.ietf.org/mailman/listinfo/mls <https://www.ietf.org/mailman/listinfo/mls>
>>> _______________________________________________
>>> MLS mailing list
>>> MLS@ietf.org <mailto:MLS@ietf.org>
>>> https://www.ietf.org/mailman/listinfo/mls <https://www.ietf.org/mailman/listinfo/mls>
>> 
>> _______________________________________________
>> MLS mailing list
>> MLS@ietf.org <mailto:MLS@ietf.org>
>> https://www.ietf.org/mailman/listinfo/mls <https://www.ietf.org/mailman/listinfo/mls>
>