[Jmap] [Jmap for Sieve] Binary/blob vs. Inline

Hans-Joerg Happel <happel@audriga.com> Thu, 21 October 2021 16:42 UTC

Return-Path: <happel@audriga.com>
X-Original-To: jmap@ietfa.amsl.com
Delivered-To: jmap@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 232083A0691 for <jmap@ietfa.amsl.com>; Thu, 21 Oct 2021 09:42:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id q5-07X58OQA4 for <jmap@ietfa.amsl.com>; Thu, 21 Oct 2021 09:41:58 -0700 (PDT)
Received: from mail.audriga.com (mail.audriga.com [176.221.42.35]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F2733A0646 for <jmap@ietf.org>; Thu, 21 Oct 2021 09:41:54 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.audriga.com (Postfix) with ESMTP id C8C0DA162 for <jmap@ietf.org>; Thu, 21 Oct 2021 18:41:51 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at mail.audriga.com
Received: from mail.audriga.com ([127.0.0.1]) by localhost (mail.audriga.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6JRXSq5VvbeW for <jmap@ietf.org>; Thu, 21 Oct 2021 18:41:27 +0200 (CEST)
Received: from [192.168.10.154] (b2b-109-90-161-242.unitymedia.biz [109.90.161.242]) (Authenticated sender: happel@audriga.com) by mail.audriga.com (Postfix) with ESMTPSA id 1716CA06C for <jmap@ietf.org>; Thu, 21 Oct 2021 18:41:27 +0200 (CEST)
To: jmap@ietf.org
From: Hans-Joerg Happel <happel@audriga.com>
Message-ID: <a192b2f3-4b4c-1d94-7302-fab59749528a@audriga.com>
Date: Thu, 21 Oct 2021 18:41:26 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/jmap/YE-lpjZZmCN6o5saUS-CnFIEb6s>
Subject: [Jmap] [Jmap for Sieve] Binary/blob vs. Inline
X-BeenThere: jmap@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: JSON Message Access Protocol <jmap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jmap>, <mailto:jmap-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jmap/>
List-Post: <mailto:jmap@ietf.org>
List-Help: <mailto:jmap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jmap>, <mailto:jmap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Oct 2021 16:42:15 -0000

Hi,

as in today's JMAP WG call, and in prior threads on this list, there is 
some design decision to take on which minimal API calls are required to 
read/write a Sieve script.

The currently proposed procedure is non-atomic:

READ SCRIPT:
R1) obtain script "metadata" (including blobId)
R2) obtain actual script using blobId

WRITE SCRIPT:
W1) upload actual script as blob (obtaining blobId)
W2) set script "metadata" using SieveScript/set method


We see two issues with this approach:
1) It is not atomic, which may cause orphans to exist; resp. will 
require clients (or server) to do cleanup tasks (e.g., removing blob if 
second write call fails)
2) The whole approach seems to *assume the existence of a (global) blob 
store*, which *might not hold in general*


Point (2) has different implications for read and write:

=========================
READ
=========================
Assuming a JMAP backend might be heterogeneous (no single blob store; 
e.g: JMAP API encapsulating an email server and a separate Sieve 
server), the backend implementation might not "know" in step (R2) if a 
given blobId relates to an email attachment (to be obtained from mail 
store) or to a Sieve rule (to be obtained from rule store).

There might be two workarounds here:
* Encoding which blob store to look up in the blobId (e.g. using prefix: 
"sieve-123-123-123")
* Managing email + Sieve backend as different JMAP accounts; each with a 
logically separate blob store (as recommend by Neil in today's call)

=> So, this seems manageable


=========================
WRITE
=========================
When agreeing that JMAP for Sieve should be capable to wrap arbitrary 
existing Sieve systems (which may persist its scripts in files, LDAP, Db 
etc.), one might encounter backends which do not have a blob storage 
available. Let's just assume, the underlying Sieve storage is some 
database (with a fixed schema), which requires a single INSERT for 
script name + content and that would otherwise fail with an integrity 
constraint error.

In this situation, the current write process, as described above, would 
require the server-side JMAP for Sieve implementation to:
* Accept blob object written by client and *store temporarily* (e.g., 
write on disk?)
* Once the client calls "SieveScript/set", look up the blob file, read, 
merge + write to database

=> We'd argue that this process is both complex to implement plus it's 
not particularly elegant to create a need for storing a temporary file. 
It would therefore be good if it is possible to (alternatively?) submit 
the script inline in the "SieveScript/set" command instead of a blobid.

Best,
Hans-Joerg

ps.: A more general question of the "write blob first" approach is 
(beyond just uploading Sieve scripts) in how far quotas are respected. 
If blob storage can be used for all types of data, but quota (both total 
and max-element-size) may differ for email attachments, Sieve scripts or 
files - how is the initial blob storage supposed to "know" which quota 
to check against?

=> This seems to be addressed by a separate quota for unreferenced 
blobs, which includes size and item count [1].

This seems not entirely elegant, if clients can upload large chunks of 
data based on a high "blob quota", but only learn from a subsequent 
command, that the server won't accept the previously uploaded file due 
to further type-specific quotas. Such things could e.g., cause trouble 
for a brute-force migration client, which concurrently uploads large 
emails or files, and which (due to imperfect implementation) keeps 
submitting data into the blob store, even if email or files are actually 
already overquota.

Probably not a very strong concern, but one might alternatively think 
about a way in which users signal their later usage intent (e.g. "email 
attachment") at blob upload time, so that the server can reject early?

[1] https://jmap.io/spec-core.html#binary-data- "The server SHOULD use a 
separate quota for unreferenced blobs to the account’s usual quota."

-- 
audriga GmbH
Durlacher Allee 47
76131 Karlsruhe
Tel: +49 (0) 721 17029 316
Fax: +49 (0) 721 17029 3179

support@audriga.com
https://www.audriga.com

Handelsregister: Amtsgericht Mannheim - HRB 713034
Sitz der Gesellschaft: Karlsruhe
Geschäftsführer: Dr. Frank Dengler, Dr. Hans-Jörg Happel
USt-ID: DE 279724142