Re: [Tools-discuss] matrix tests

Matthew Hodgson <matthew@matrix.org> Thu, 17 December 2020 23:13 UTC

Return-Path: <matthew@matrix.org>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6B6B83A083E for <tools-discuss@ietfa.amsl.com>; Thu, 17 Dec 2020 15:13:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=matrix.org header.b=D14l98YY; dkim=pass (2048-bit key) header.d=messagingengine.com header.b=FmOpamfx
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id dalfIMcw6eLB for <tools-discuss@ietfa.amsl.com>; Thu, 17 Dec 2020 15:13:06 -0800 (PST)
Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 75D193A0836 for <tools-discuss@ietf.org>; Thu, 17 Dec 2020 15:13:06 -0800 (PST)
Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id 53BF49F0 for <tools-discuss@ietf.org>; Thu, 17 Dec 2020 18:13:05 -0500 (EST)
Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Thu, 17 Dec 2020 18:13:05 -0500
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matrix.org; h= subject:to:references:from:message-id:date:mime-version :in-reply-to:content-type:content-transfer-encoding; s=fm1; bh=x A69nRAZDUIpo3eE8xM/cO4Urydy1BtNHVfozDL8t+Q=; b=D14l98YYkNkyGPvj4 2mKRYBiLnjidnkho32rr4X5Ut8TVL/tKxPvJXfmnxrd0/nwkdlFlSK2jFauKYSex GXPIdWBCdQXKxDBwXO/sC2F4AJXE76gDa8iDz2svTQ1hV8ZkdALkhZ6EfLwOtnpP cJh7RnOTDVARp4NlHcN+V2+5iyYXcYvg10xjpPgKblkShx7gvlSFEHD/FbTCYW+K tOMqHcvDWOsL0oma+hapWsLVPf6YIDKlb7VzJU2NAAi7+0lNN3UGYfDmnFWSY91n +1lrdb+iYNDQCmpxxmO7F9myssoANbgvujW2NKHkMaFmVG+bL5pbdi/XX0atyo5c JRXmw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=xA69nRAZDUIpo3eE8xM/cO4Urydy1BtNHVfozDL8t +Q=; b=FmOpamfxam5Bs8xD1+gZW6Jps7/hzOgDDoCWQacDhEBzG4n6ioGRXWbF1 yvljixEdm/BGcEU2xWw8PL9o9031cR929hckYugKnSrQl8Jn4iybNAtbCZWvXN91 fItK2mOAlK+hGrXth0NeKGmiBqc3nfGA72EDsteMHu29DsTzWx3zkWTbfkazuH3i DIJOANony7+ZzJ9j/oxddatzvpUbgA46nunLORLu+dFgY3KJo3FmI2B070jYFh5a cmtZ6nXMqUnM8nSGR1j0Hv2eCwc5tVb1Hpht3e2h7N8EkK6DVI1hzZ9x4zbHBXN8 Vfv8AZgew5WDISdd6DyhQvKOuYezA==
X-ME-Sender: <xms:AObbXxuu5iwCf6uG6LPzdONNX4UVEtgol-HCdUHRQhnQGXQQxOET_w> <xme:AObbX6d_fpGDyGErl7YnPB2slXjRs9bXmXoJsnKOj716uRfOz0Na0NI_adfFh6e_x sJApwRCHrV98mR2A4s>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrudelhedgtdejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefuvfhfhfhokffffgggjggtgfesth ekredttdefjeenucfhrhhomhepofgrthhthhgvficujfhoughgshhonhcuoehmrghtthhh vgifsehmrghtrhhigidrohhrgheqnecuggftrfgrthhtvghrnhepkeeghfduvdevtedtke dufedvuedvgffhueffjeevgeeltdffueettdfgieetleegnecuffhomhgrihhnpehgihht hhhusgdrtghomhdpmhgrthhrihigrdhorhhgnecukfhppeekvddrgedtrdelfedrfeenuc evlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmrghtthhh vgifsehmrghtrhhigidrohhrgh
X-ME-Proxy: <xmx:AObbX0y84BD6nxlh_SX3UZ7Jw4-PHRmOrqVZw3v0uaZ68EDV6zvJxQ> <xmx:AObbX4Pgz9tDkWdl_9tWQKz12rBQKjU3atA6kIn109PkXKE2aJ5dhQ> <xmx:AObbXx8OQCpImNTy41C77WumtHDyabtn1a1zv1FdL5IysNHbCt0x0Q> <xmx:AObbXzIgTUOOUoX4a6aO9Y-_juFNwGDsoXkGUhQfBzXpV_PwkdHO6g>
Received: from bucephalus-2.local (cpc152095-haye27-2-0-cust2.17-4.cable.virginm.net [82.40.93.3]) by mail.messagingengine.com (Postfix) with ESMTPA id 9669F108005F for <tools-discuss@ietf.org>; Thu, 17 Dec 2020 18:13:03 -0500 (EST)
To: tools-discuss@ietf.org
References: <e547d7be-8838-ce7f-fad5-61af474f7d12@cs.tcd.ie> <15081.1607893032@localhost> <a798569d-0873-c199-1554-8c3aa85e0923@amsl.com>
From: Matthew Hodgson <matthew@matrix.org>
Organization: The Matrix.org Foundation C.I.C
Message-ID: <c7aa5f78-7afc-6518-420d-276fc4f5a5da@matrix.org>
Date: Thu, 17 Dec 2020 23:13:02 +0000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Thunderbird/84.0
MIME-Version: 1.0
In-Reply-To: <a798569d-0873-c199-1554-8c3aa85e0923@amsl.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/G4-c-2P9R3I5dJQcWI2_QJGSM58>
Subject: Re: [Tools-discuss] matrix tests
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Dec 2020 23:13:08 -0000

Better late than never I can try to provide more clarity on Matrix's 
data processing/retention semantics - please see comments below...

On 13/12/2020 23:02, Glen wrote:

> Stephen Farrell <stephen.farrell@cs.tcd.ie> wrote:
>> It didn't occur to me to create any matrix rooms. If it had, I'm not
>> sure if that'd have been useful but I can imagine it being so in
>> future, e.g. for a bar bof. Before that'd be a standard procedure I'd
>> like to understand the duration for which any such rooms might last,
>> how closed/open they may be etc.
>
> I can actually report some technical data on that.
>
> As I was performing server maintenance over the past few days, I 
> discovered that the Matrix database was taking up a significant amount 
> of space (91 GB for 37 local users.).

There are several reasons for this:

  * The server has been provisioned using sqlite rather than postgres as 
the DB.  Sqlite is meant to be just for evaluation, not for production, 
and I dread to think how fragmented or unoptimised a 91GB sqlite DB has 
got :)
  * Synapse's schema is currently quite naive on how it snapshots the 
state of its chatrooms.  It currently takes a snapshot every 100 state 
changes (where state changes are things like room membership changes; 
not normal messages).  Thus a room with 10,000 members causes another 
10,000 db rows to be inserted every 100 joins or parts.  This is 
obviously terrible; there's a workaround to apply a better compression 
algorithm via https://github.com/matrix-org/rust-synapse-compress-state, 
which needs to be merged into Synapse proper.
  * As you've pointed out, Matrix works by replicating conversations 
between servers.  So even if you have 1 user, if they go nuts and try to 
join 1,000 busy rooms with huge numbers of participants, they can suck 
up a lot of disk space.  Conversely, if you have 1,000 users and they 
are all in tiny rooms, the opposite could be true.  Matrix is more like 
hosting an IMAP server than (say) an SMTP or XMPP server in this respective.
  * You can configure the server to have limited the data it stores for 
the rooms its users are in, and I assume this hasn't been configured: 
https://github.com/matrix-org/synapse/blob/44b7d4c6d6d5e8d78bd0154b407defea4a35aebd/docs/sample_config.yaml#L410-L421.

> I put in an inquiry but didn't find anyone online (likely people are 
> offline for the weekend) so I did some digging myself.

Sorry - I think our messages crossed while you were writing the original 
mail :)

> Unlike Jabber, where a room is hosted on a single server; in Matrix, 
> it turns out that every room that anyone belongs to is co-hosted on 
> every server used by any of that room's users. [1] Even if all (local) 
> users leave the room, the room remains on the server, along with all 
> of the shared data.

This isn't strictly true.  If all local users leave the room, and 
"forget" it (equivalent to expunging an IMAP folder - it's an option 
after you leave a room in the 'Historical' section in Element), then the 
data is free to be garbage collected.

> Moreover, if I understand what the developers explained to me, it is 
> not possible to delete a room "everywhere".  I can delete a room that 
> exists on the local server, but that does not delete the room from 
> other servers to which the room was shared.

In theory you could request other servers delete the room and hope they 
honour the request.  However, i'm not sure that I'd like some random 
admin/moderator/user on a remote server to decide to reach in and 
destroy my copy of a room on my server - so we haven't implemented it yet.

> And if our server somehow became disconnected from the rest, all of 
> the rooms that had been shared out to other servers would continue to 
> exist on those other servers forever as orphaned clones (for, as we 
> like to say, some value of "forever"), outside of our ability to 
> control (or perhaps even enumerate.)

They wouldn't really be "orphaned clones", any more than a local git 
repository clone is an "orphaned clone" of whatever you cloned it from.  
To be orphaned implies you once had parents.  In Matrix, the rooms are 
shared over their participating servers, and all servers are equal in 
that respect.

> So unlike operating a Jabber server, where the conference server is 
> the "authoritative" location of the room (and logs are captured 
> there), and, it turns out, unlike operating a Zulip server (which also 
> has an authoritative location).... operating a Matrix server isn't 
> really "operating" a server so much as "joining a huge network of peer 
> servers."

Yup, precisely. It's the same difference between using SVN and using 
Git.  Most people consider the lack of authoritive server to be an 
extremely desirable feature - it means that if I join IETF rooms from my 
personal server, I have full sovereignty over my copy of the 
conversation, even if my 'net connection goes down, or even if something 
terrible happened to the IETF server, etc.  I have network partition 
tolerance.

> As the IETF's operator, I do not set policy, so this is just my 
> opinion... but I *think* that having our Matrix server mirroring some 
> of the rooms in my list above seems... problematic. 

I would expect this to be no worse than (say) running a caching HTTP 
proxy for users.  If your users choose to go view weird stuff out there 
on Matrix, then your server ends up with a local cache of that data 
until it expires.

However, if you wanted to limit the servers your server can federate 
with, that's possible too (but not very idiomatic - just as it wouldn't 
be idiomatic to specify a whitelist or blocklist of email servers to 
talk SMTP with): 
https://github.com/matrix-org/synapse/blob/44b7d4c6d6d5e8d78bd0154b407defea4a35aebd/docs/sample_config.yaml#L673-L695

> If it turns out that this is the behavior desired by the IETF, then no 
> changes are needed. If, however, the IETF wants to operate the Matrix 
> server in a more-or-less authoritative, "Note Well"-compliant kind of 
> way, then I *think* the Matrix server would need to be deployed with 
> federation disabled (meaning, local user signups only, and no room 
> sharing across the world), and public room sharing disabled as well.  
> We can certainly do this, but we have not tested that kind of setup yet.

You could do this, but it would very much miss the fundamental nature of 
Matrix - the equivalent of realising that because an internet-connected 
LAN could be used by its users to access unexpected content on the 
internet, one should isolate it from the internet.  I think this might 
be counter to the IETF's aethos :)

In terms of being authoritative in a Note Well kind-of way, it's worth 
noting that the replicated copies of rooms which other servers maintain 
are cryptographically signed to match each other.  So from a logical 
perspective, they do not diverge (other than during a netsplit).

> But unless they make that choice, then the answer to Stephen's 
> questions seem to be:
>
> > the duration for which any such rooms might last,
>
> "forever", with no option to delete/close unless federation is 
> disabled, and

This is not correct. The server admin can boot rooms off the server, or 
specify a retention policy, or the room admin can specify a retention 
policy.  If all users have left and forgotten the room, the server could 
garbage collect it (if desired, although this hasn't landed in Synapse 
yet). Federation has no impact on the ability to delete/close a room.

 > how closed/open they may be etc.

> "wide open to every Matrix server of every participating user."

This is not necessarily true: if data privacy is a concern, you can 
enable end-to-end encryption on the room, and then the data is not 
visible to *any* server - just the authorised participating users' devices.

Aside from end-to-end encryption, Matrix rooms also impose decentralised 
ACLs to limit which users and servers can see the history based on the 
room admin's requirements; it turns out this is a novel area of computer 
science: 
https://matrix.org/blog/2020/06/16/matrix-decomposition-an-independent-academic-analysis-of-matrix-state-resolution.

So while it's true that the new messages in an unencrypted room is wide 
open and visible to all servers which have users currently participating 
in that room, it is not true to say that past history is visible (unless 
explicitly shared), and future messages are not replicated after a 
server's users leave.

Hope this provides some clarity.  Apologies that the mental process of 
familiarisation with Matrix is a paradigm shift similar to going from a 
VCS to a DVCS, but when it clicks, much like a DVCS, its characteristics 
hopefully feel desirable rather than unexpected and weird.

thanks,

Matthew


-- 
Matthew Hodgson
Matrix.org