Re: [Mimi] Some thoughts on introduction and discovery

Rohan Mahy <rohan.mahy@wire.com> Wed, 22 March 2023 01:05 UTC

Return-Path: <rohan.mahy@wire.com>
X-Original-To: mimi@ietfa.amsl.com
Delivered-To: mimi@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 50804C153CA8 for <mimi@ietfa.amsl.com>; Tue, 21 Mar 2023 18:05:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.086
X-Spam-Level:
X-Spam-Status: No, score=-7.086 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=wire.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 12IPHPtIH2xE for <mimi@ietfa.amsl.com>; Tue, 21 Mar 2023 18:05:40 -0700 (PDT)
Received: from mail-wr1-x430.google.com (mail-wr1-x430.google.com [IPv6:2a00:1450:4864:20::430]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6C2CFC151544 for <mimi@ietf.org>; Tue, 21 Mar 2023 18:05:35 -0700 (PDT)
Received: by mail-wr1-x430.google.com with SMTP id t15so15457664wrz.7 for <mimi@ietf.org>; Tue, 21 Mar 2023 18:05:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wire.com; s=google; t=1679447133; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=LUIAsdAaQef36FsGkA5tay9FnPZfEEF9QQxTSGAhaQ4=; b=TJVMYZQwGMOzel/VN0B3VWoYNRxlgY8yBlW0Z3JGQwOt4eZmdyF1WCeU00WKhZxCJ+ gLYYVzdyxpZ+EjpBwoIiZgnV98+uYRVON8ATVNqUmg+XOkUcPC9YgEDaI1aarhCtnC0T MRNR+Z/LM7FTdvHKZaQTQNWep/sIOFdL/4wLiiUyEgLmZ84S+o6j14MpeLuzNuKKyE0I BPkkjZnR0kh7v+j9apCDCbPJMvSXpzuNyl3y4BXorgtaF+O3uHODMDN3tWznxea9HkdT fPjeZqy7VDtz5a1NzYLEWADkW+6pKsu1U1pDKO3g+S754Y33rO31oXI+SMnVchFosaJZ ZjSw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679447133; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=LUIAsdAaQef36FsGkA5tay9FnPZfEEF9QQxTSGAhaQ4=; b=3L2XAp93W0+P/ZBaNCeckb3VHhy6yLWqcLwf1/adSa3eJJFPkTt6z2JORB8wCo4iiY LGnU6xIS+2rtFkj1HrPLVjF+mDxk30yRE/uoX3R/dkvHBaQBiTIZacWYZh9GdNLu41Nb PTbHgixUDR2QETHyKIrI44Z4/pIXOkKN+CFeGeAN99lUVii956CIzbq/SOKBqsGziK+5 ykHIK0d44pUXWKUT5RBL+eU5Rb1WTLZxGHZfGc2x2aDXBvZVge3zJtMIb3mJuhGHYVqL tEjY4hgAw8llmhxuTCPb+FOmzk+aMK2UnfOfU9lsMWYt/iOPC9JOJgW3PvzYOKVAIptV DrWg==
X-Gm-Message-State: AO0yUKUbzlIhZBh1MkFFzoRrfckDFNXZtfZ5opHRF7fat8Sr9CGCdcOP 9vYwQB2CgGpZhbU/6qqgr5MWMD66fSit7fa46ajuYAeLPjdx6uh4lNg=
X-Google-Smtp-Source: AK7set+9GzianWK54DhRslcFG+fw163ldEXXb/XGmjPovw8688SlSUteJ1ypIUXoA8Dkypo+J8Ry6B7umr3SuANK46w=
X-Received: by 2002:a5d:5109:0:b0:2d1:7ade:ab8 with SMTP id s9-20020a5d5109000000b002d17ade0ab8mr1017885wrt.11.1679447133251; Tue, 21 Mar 2023 18:05:33 -0700 (PDT)
MIME-Version: 1.0
References: <CABcZeBPK1Bv676O_W5AmZR9MPa9yViQ4nB7AmMbkHPzmzM+FjA@mail.gmail.com> <DACA5DFE-D362-4165-80B2-F346837845F5@raphaelrobert.com>
In-Reply-To: <DACA5DFE-D362-4165-80B2-F346837845F5@raphaelrobert.com>
From: Rohan Mahy <rohan.mahy@wire.com>
Date: Wed, 22 Mar 2023 10:05:22 +0900
Message-ID: <CACW8--MzW1F8uHaF3Bn5BgQG4MsWuM7J+GmhuS9dfqstrpBMGA@mail.gmail.com>
To: Raphael Robert <ietf=40raphaelrobert.com@dmarc.ietf.org>
Cc: Eric Rescorla <ekr@rtfm.com>, mimi@ietf.org
Content-Type: multipart/alternative; boundary="000000000000cc038705f772c132"
Archived-At: <https://mailarchive.ietf.org/arch/msg/mimi/FcEpr233EKqcisl-7q9qabBl-3Q>
Subject: Re: [Mimi] Some thoughts on introduction and discovery
X-BeenThere: mimi@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: More Instant Messaging Interoperability <mimi.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mimi>, <mailto:mimi-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mimi/>
List-Post: <mailto:mimi@ietf.org>
List-Help: <mailto:mimi-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mimi>, <mailto:mimi-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Mar 2023 01:05:44 -0000

Hi,
I am seeing a lot of good thoughts from ekr and Raphael here that I agree
with. I want to mention one that I am concerned with and one general
comment about privacy.

1) Raphael said:
>In addition, KeyPackages should generally be as short-lived as practically
possible. In particular, so-called “last resort” KeyPackages (those that
can never be deleted from the pool) should also have an expiry date and be
rotated regularly.

My concern about short-lived KeyPackages is that many e2e encrypted systems
(including Wire) allow you to go on vacation for a few weeks; then come
back, collect your messages, and process/decrypt them; and continue without
loss of functionality. If KeyPackage expiration is less than the amount of
time the client can be offline, then new Welcomes don't work and lots of
things start to break down. I don't want our solution to spam to hinge on
regular KeyPackages being valid for less than a few weeks. If you don't
have that requirement, feel free to create shorter KeyPackages, but please
don't require those that do need them to go without.

2) A general note about privacy: We can put people in different categories
according to their expectation of privacy of their identifiers, because
they can be very different. For example, a salesperson might already
publish their work cell phone, email address, calendar link, and IM handle
on their public web page. A celebrity or political dissident might not be
discoverable/searchable even if one of their identifiers is known by the
searcher. I think our use cases have largely focussed around ordinary
consumer and non-public-facing business users, which is a good default use
case, but let's please occasionally test that these other user types work
too.

Thanks,
-rohan





*Rohan Mahy  *l  Vice President Engineering, Architecture

Chat: @rohan_wire on Wire



Wire <https://wire.com/en/download/> - Secure team messaging.

*Zeta Project Germany GmbH  *l  Rosenthaler Straße 40,
<https://maps.google.com/?q=Rosenthaler+Stra%C3%9Fe+40,%C2%A0+10178+Berlin,%C2%A0+Germany&entry=gmail&source=g>10178
Berlin,
<https://maps.google.com/?q=Rosenthaler+Stra%C3%9Fe+40,%C2%A0+10178+Berlin,%C2%A0+Germany&entry=gmail&source=g>
Germany
<https://maps.google.com/?q=Rosenthaler+Stra%C3%9Fe+40,%C2%A0+10178+Berlin,%C2%A0+Germany&entry=gmail&source=g>

Geschäftsführer/Managing Director: Alan Duric

HRB 149847 beim Handelsregister Charlottenburg, Berlin

VAT-ID DE288748675


On Wed, Mar 22, 2023 at 5:33 AM Raphael Robert <ietf=
40raphaelrobert.com@dmarc.ietf.org> wrote:

> Thanks for writing this up. I want to delve into a lot of the details but
> for the sake of clarity I’ll just reply on a few of the points for now. For
> context, we spent the past months researching solutions to the tension
> between the need for privacy and the functional requirements you mention.
>
> On 21. Mar 2023, at 14:49, Eric Rescorla <ekr@rtfm.com> wrote:
>
> I've been doing some thinking about the overall MIMI architectural
> questions and wanted to try to focus on one in particular, namely
> around setting up communication.  At a high level, we have the
> following three main scenarios, in descending order of Alice's
> knowledge:
>
> 1. Alice knows Bob's provider and an identifier within that provider
>    scope (e.g., a phone number and that he uses WhatsApp) (an SSI).
>
> 2. Alice knows some sort of unique identifier for Bob (e.g., an E.164
>    number or an e-mail address) but not what provider he is using
>    (an SII).
>
> 3. Alice knows some out-of-band identifier for Bob but not any instant
>    messaging address (this is what draft-rosenberg-mimi-transport
>    assumes).
>
> In all of these cases, we want to go from that identifier to a
> situation where Bob has consented to communicate with Alice or what
> many systems would call being a "contact". Once that relationship
> is established, Alice can message Bob or try to add him to groups.
>
>
> # Security Requirements
> Many of the security requirements are covered by MLS, but not all.
>
> Spam
> : In discussions so far, we've assumed that one of the primary
> requirements is that Alice not be able to spam Bob. Specifically, that
> she should not be able to send messages to Bob or add him to groups
> without first getting his consent. She should also not be able to use
> invitations as a form of spam.
>
> Consent to join groups
> : JDR's draft seems to assume that Alice shouldn't be able to add
> Bob to groups without his consent. I think we should discuss this,
> as most systems I am familiar with do not have this property,
> especially for 1:1 messages. Even for larger groups, may systems
> allow people to be added but they can remove themselves.
> It's important to distinguish two cases here:
>
> (1) Someone is added to a group but doesn't have to see the messages.
> (2) Someone is added to the group and also gets all the messages.
>
> One can imagine a design where you are always added to groups
> at the MLS layer, but the messages are quarantined until you
> join.
>
>
> I think it is even more fine-grained:
>
> (1) Adding someone to a group is suppressed by the server (typically in a
> lot of systems today). Ideal solution.
> (2) Adding someone is silently suppressed by the client. Not ideal in
> terms of resource consumption, but at least the user isn’t bothered.
> (2) Adding someone to a group always works. This is to be avoided, since
> it’s quite disturbing for users.
>
> I’m confident we can achieve (1) in most cases, when consent needs to be
> given first (see consent below). For situations where consent has been
> retracted but perhaps not propagated instantly, (2) might be ok.
>
> With MLS, there might a relatively simple way to achieve (1) in practice:
> There can be different KeyPackages for 1:1 connections and for groups. The
> KeyPackages can be “tagged” with an extension set by the generating client,
> so we’d have Connection KeyPackages and Group KeyPackages. The latter would
> only be made available to users once a 1:1 connection exists. Connection
> KeyPackages would still be easily obtainable, but couldn’t be used in group
> chats because all members in the group would reject them.
>
>
> Key Exhaustion
> : It should not be possible to Alice to exhaust all of Bob's KeyPackages.
>
>
> I guess the underlying assumption is that there is a pool of KeyPackages
> each user uploads (or otherwise makes available), and “depletion” relates
> to the pool.
> This is also quite similar to how things work for Signal and similar
> protocols.
> The inherent downside with this design is that there is no way to really
> guarantee a pool cannot be depleted other than having a *really* large pool
> and enforcing very strict rate-limiting, neither of which is desirable.
> Since we cannot guarantee anything, we should instead focus on getting the
> best practical security properties.
>
> A recent change in the MLS spec has introduced a second HPKE public in
> KeyPackage that is used only for encrypting a Welcome message and therefore
> improving forward secrecy. Combined with the rule that new members should
> update their key material as soon as they join a group, re-using
> KeyPackages is still not ideal but perhaps tolerable.
>
> In addition, KeyPackages should generally be as short-lived as practically
> possible. In particular, so-called “last resort” KeyPackages (those that
> can never be deleted from the pool) should also have an expiry date and be
> rotated regularly.
>
>
> Privacy for service discovery
> : In the case where Alice doesn't know Bob's service, we need to decide
> whether she needs to consent before he can discover it. I think there
> is an obvious privacy argument here, but it's generally possible to
> test whether Bob is on a *specific* service, and there may be a tension
> with privacy for contact discovery.
>
>
> Privacy for contacts
> : Ideally it would not be possible for arbitrary people to determine
> who Alice's contacts are. When Alice messages Bob, it's probably hard
> (though not impossible) to conceal this from their respective
> providers, but ideally (1) no third party should know and (2) the
> providers shouldn't know if Alice just has Bob in their contact list
> (e.g., in your OS phonebook) but hasn't actually asked to connect.
>
>
> I am convinced that concealing contact lists of users from their own
> service should be a design goal for consumer-grade messaging as best as
> possible. Signal has achieved this (in that the contact list is not stored
> on the service, however still observable by the service) and proposing a
> system that has much worse privacy guarantees would look like taking a step
> back.
>
> I think that this level of privacy is not impossible to achieve. In
> practice, we have been experimenting with techniques like pseudonimization
> and protocols like privacy pass to build something that doesn’t reveal
> contacts to the service, yet still has the usual functional requirements
> known from consumer-grade messaging systems.
>
> For enterprise-grade messaging (and by that I generally mean messaging in
> the context of organizations), the privacy requirements are typically not
> so strict, e.g. a company already knows who its employees are. However,
> leaking contact lists across services would most likely still be a problem.
>
> This is where I see one of the fundamental differences between business
> messaging and consumer messaging. A one size fits all approach might not
> work here. The ideal outcome would however be that one is just a variation
> of the other at the protocol level.
>
>
>
> # Thoughts On Architecture
>
> I think we should start by trying to solve scenario 1, in which Alice
> has an SSI for Bob. This is clearly easier than solving (2) and I
> don't think (3) is an acceptable user experience.  This also lets us
> build a staged solution where we have a separate system that goes from
> an SII (1) to a SSI.
>
>
> ## Establishing Consent
>
> In the existing closed systems, the basic consent idiom is that Alice
> is allowed to send Bob an invitation to connect, typically with no
> message (e.g., "Alice would like to connect") or maybe some kind of
> very restricted message (e.g., only a few characters). Given that
> Alice already has Bob's identifier (by assumption), this is
> straightforward technically. These restrictions don't make spam
> impossible of course. For instance, you can change your Display Name
> to be a spam message. We don't have too much information about how
> these platforms control spam, but presumably they do some kind of
> monitoring of the number of messages people send, acceptance rate,
> etc.
>
> Obviously this is harder where we don't have a single system, so we
> need to figure out how to address that. There's some obvious tension
> here because a lot of the obvious designs involve being able to see
> the connection graph of the inviter (and potentially the invitee),
> which has privacy issues. It seems to me like there there are several
> main approaches here, which we can sort of mix-and-match:
>
> - Trust the inviter's side to do all spam prevention, but with
>   the peer having the option to defederate if the overall
>   stats look bad.
>
> - Trust the inviter's side to provide overall reputation data
>   for the inviter (e.g., number of connections, number of
>   outstanding unaccepted connections, etc.) so that the peer
>   can do spam suppression.
>
> - Have some level of public and verifiable information about
>   the inviter's connection status (insert crypto handwaving
>   here if we also want privacy) that the peer can use.
>
> I think we need to know a lot more about how existing systems
> behave before we can design something in detail.
>
>
> FWIW, I think the spam protection might be easier to implement on the
> receiver’s service. Given that in a federated architecture we have
> additional threats models to consider, where not only users can be
> malicious towards servers but also servers can be malicious towards
> servers, things get more complicated.
> In the scenario where Alice wants to connect to Bob, Bob’s service is
> probably the best suited entity to protect Bob’s device from receiving the
> connection request.
>
>
>
> ## Consent and Messaging
>
> Once Alice has Bob's consent, then she can send him a message.
> However, if we explicitly serialize these then we add a lot of
> latency. I.e.,
>
>
> Alice              DS                  Bob
>
>                                  [Offline]
> Connect? ---------->
> [Goes offline]
>                       ------------------->
>                              [Goes online]
>                      <----------- Accepted
>                        [Goes offline]
>
> [Goes online]
> <---------- Accepted
> Hello ------------->
>
>                              [Goes online]
>                       Hello ------------->
>
> A better user experience is if Alice can send a message in parallel
> with her connection request but that Bob only sees it if he accepts
> the connection. I.e.,
>
> Alice              DS                  Bob
>
>                                  [Offline]
> Connect? ---------->
> Hello ------------->
> [Goes offline]
>                       ------------------->
>                              [Goes online]
>                      <----------- Accepted
>                       Hello ------------->
>
>
> In a non-E2E-encrypted system, this is straightforward, but in
> an E2E system we need to worry about KeyPackage exhaustion by
> Alice. It seems like there are a number of possible designs that
> would work here, including:
>
> - Rate limiting KeyPackages to non-contacts
> - Having a special non-contact KeyPackage that didn't necessarily
>   offer PFS.
> - The same kind of anti-spam systems we discussed above.
>
> In either case, once Alice has connected with Bob, I think we should
> probably let her send group invites and messages concurrently.
>
>
> I think this goes hand-in-hand with what was discussed regarding the
> consent to join groups.
>
> For connection requests, at the protocol level Alice can immediately and
> asynchronously encrypt data for Bob. In other words, Alice could send a
> connection request and also encrypt an initial message (or more) to Bob
> even though Bob has not accepted the connection yet.
> On Bob’s side, it might be wise not to show all the content Alice sent,
> because that is the juiced bit for spam usually. Balancing this is a hard
> task and different systems take different approaches.
>
> The levels of stopping that spam would again be similar to above:
>
> (1) The threat actor is stopped by a server
> (2) The threat actor can push content to the target's device, but the
> target doesn’t see the content
> (3) The threat actor can push content to the target’s device, and content
> that is shown to the target
>
> While this remains a gnarly problem – particularly because of the tension
> between privacy and functional requirements – at least the scope is only
> 1:1 connection requests, not groups.
>
> Regarding KeyPackage exhaustion: as mentioned earlier, I don’t think the
> problem is as bad. Since in a non-adversarial situation key material is
> rotated immediately when the connections request is accepted, the FS
> problem is limited to either spam or “early” messages that are sent before
> the request is accepted. But maybe I’m missing something here.
>
> Raphael
>
>
>
> ## Identifier Discovery
>
> If we just had a design that fixed the SSI problem, tahen we would
> actually be in not terrible shape. It's annoying to have to pick
> people's service from a list, but it's better than nothing.  Moreover,
> if we have a solution to the consent problem with SSIs, then we can
> try to build a separate lookup mechanism.  for mapping an SII onto an
> SSI. At a high-level, there are two clases of solution:
>
> - Contacting the user using the SII (a la SPIN or
> draft-rosenberg-mimi-transport)
> - Directory systems
>
> The main drawbacks of contacting the user directly using the SII is
> that it's clunky. In addition, some variants (SPIN), it depends on
> non-open properties of the receiver's client device, which isn't
> ideal.
>
> The main challenge with directory systems is privacy. As noted above,
> there are potential privacy issues. The obvious design is just to
> publish a directory mapping any SII to the set of SSIs/services it is
> associated with, but this minimally allows the set of federated
> services (which in an open system means basically everyone) to learn
> the contents of the directory. In addition, the directory service gets
> to learn that Bob wants to call Alice, whereas ideally just Alice and
> Bob's providers would learn about it.
>
> I've been doing some thinking about crypto-type designs for addressing
> these issues
> (https://educatedguesswork.org/posts/messaging-discovery/#privacy),
> but it's not really perfect, and it's only a partial scraping defense.
>
> -Ekr
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
> Mimi mailing list
> Mimi@ietf.org
> https://www.ietf.org/mailman/listinfo/mimi
>
>
> --
> Mimi mailing list
> Mimi@ietf.org
> https://www.ietf.org/mailman/listinfo/mimi
>