Re: [Extra] Discussing SNIPPET

Michael Slusarz <michael.slusarz@open-xchange.com> Fri, 21 September 2018 00:03 UTC

Return-Path: <michael.slusarz@open-xchange.com>
X-Original-To: extra@ietfa.amsl.com
Delivered-To: extra@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 102D6130DC6 for <extra@ietfa.amsl.com>; Thu, 20 Sep 2018 17:03:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.3
X-Spam-Level:
X-Spam-Status: No, score=-4.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=open-xchange.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LNdkfFtZgMe3 for <extra@ietfa.amsl.com>; Thu, 20 Sep 2018 17:03:11 -0700 (PDT)
Received: from mx4.open-xchange.com (alcatraz.open-xchange.com [87.191.39.187]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AAC9B124D68 for <extra@ietf.org>; Thu, 20 Sep 2018 17:03:10 -0700 (PDT)
Received: from open-xchange.com (imap.open-xchange.com [10.20.30.10]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx4.open-xchange.com (Postfix) with ESMTPS id 6E1266A29E; Fri, 21 Sep 2018 02:03:08 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=open-xchange.com; s=201705; t=1537488188; bh=73OzXv5yZc5Zvh9+qM2m8hLnE0+ptV+UwfjErZ5pLEc=; h=Date:From:To:In-Reply-To:References:Subject:From; b=KJtDo2KBgn8gcMTxhgcyDJBWemgEy/YGTaHgdW9Ag1hDYj5PHZ/OdZDZTH7uhR4A6 HV9EzDdPy+X0GDikiKv3DXEMMt1JEOBXHhcOGM0QyzPvxb4WV+WqAwjXCcO0F8f8Q8 3us6lUZ4/uVF53cFawCkXploQEaTF5poPHkj7l1bnO9zT24r8ridwMMqrbMifyDyzW mIkmi5jmUlZELVwtrclEpE+P63izxAHqIM5EaOGB1Hi4TLnyqWi2dFAr1agJIKEaJQ q62vkDEH2KbElmbTaSDqLdaDQ6NFBtICZLsYNZgE01miqt2+s1YGvCXJu+uR49esbt NNLjINoXUPusg==
Received: from null (appsuite-gw2.open-xchange.com [10.20.28.82]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by open-xchange.com (Postfix) with ESMTPSA id 4D55B3C01CB; Fri, 21 Sep 2018 02:03:08 +0200 (CEST)
Date: Thu, 20 Sep 2018 18:03:08 -0600
From: Michael Slusarz <michael.slusarz@open-xchange.com>
To: Bron Gondwana <brong@fastmailteam.com>, extra@ietf.org
Message-ID: <1732679468.1800.1537488188251@appsuite.open-xchange.com>
In-Reply-To: <0a9f6019-776c-49ff-a800-af494374ce59@sloti22d1t06>
References: <0a9f6019-776c-49ff-a800-af494374ce59@sloti22d1t06>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_1799_1714875867.1537488188242"
X-Priority: 3
Importance: Medium
X-Mailer: Open-Xchange Mailer v7.10.0-Rev15
X-Originating-Client: open-xchange-appsuite
Archived-At: <https://mailarchive.ietf.org/arch/msg/extra/Y3owYZX2jEg9bX9i9kcjrpHk89A>
Subject: Re: [Extra] Discussing SNIPPET
X-BeenThere: extra@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Email mailstore and eXtensions To Revise or Amend <extra.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/extra>, <mailto:extra-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/extra/>
List-Post: <mailto:extra@ietf.org>
List-Help: <mailto:extra-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/extra>, <mailto:extra-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2018 00:03:14 -0000

> On September 17, 2018 at 5:52 AM Bron Gondwana <brong@fastmailteam.com> wrote:
> 
>     a) Naming.  The great difficulties, naming and cache invalidation! (and yep, we've got the second one too).  Should it be called SNIPPET, or something else like PREVIEW?  I'm asking largely because JMAP used "SearchSnippet" to refer to a fragment of matching text with terms hilighted.  And JMAP used "preview" for the short piece of text (up to 256 characters) that represents the email content.
> 
I don't have a personal attachment to SNIPPET (other than we already implemented with that terminology, but that's easily worked around by some internal aliasing).

PREVIEW makes sense if we are looking to a future where additional algorithms would be developed.  As Chris points out, if you have algorithms that return both text blobs and something like image data, preview makes more sense semantically than snippet.

I have no objection if we wanted to change the label.


>     b) Cache Invalidation.   Ho hum.  Is this an immutable property?  If so, what happens if the server implementation changes?  I'm quite happy to call it "softly immutable" myself - it might not be exactly the same bytes, but you can use the old version without major problems.
> 
I spent quite a bit of time formulating the logic behind this topic when drafting the spec. Reproduced here for convenience:

-----

A server SHOULD strive to generate the same string for a given message for each request. However, since snippets are understood to be a representation of the message data and not a canonical view of its contents, a client MUST NOT assume that a message snippet is immutable for a given message. This relaxed requirement permits a server to offer snippets as an option without requiring potentially burdensome storage and/or processing requirements to guarantee immutability for a use case that does not require this strictness.

----

I would love to hear feedback what parts of this are weak or what parts need to be explained further.

I was thinking less about the situation of a server invalidating its cache rather than the scenario where a server loses its cache.  Demanding that server upgrades ensure that its algorithm stays exactly the same across versions, in order to rebuild exactly the preview data, is a burden I don't think is useful to enforce.  No user is going to be confused if their preview text suddenly differs across mail sessions.

Feedback so far doesn't seem to indicate this is too troubling of a design decision.  As mentioned above, I welcome feedback on how this can be explained better in the document.


>     c) Length: 
> 
>        The server SHOULD limit the length of the snippet text to 100
>        characters.  The server MUST NOT output snippet text longer than 200
>        characters.
> 
> 
>     I would prefer to bring this up to more like 200 for should and 256 for MUST NOT.  100 is a pretty short preview on big screens.  I'd also be willing to have a way for the client to hint for shorter previews to save network bandwidth, but have servers generally store 256 chars.  The syntax makes it quite easy to add a MAXLEN=80 or whatever to the fetch command.
> 
100 characters was selected via an informal survey of several different clients, both mobile and desktop, and how they currently display preview-like information in their UI.  It's probably a bit mobile-centric, since I had more of those applications available at the time.

More important, that analysis was probably too English-centric.  English is quite compact compared to many other languages, so it's probably a bad example to use when trying to determine ideal lengths (see, e.g., https://www.w3.org/International/articles/article-text-size).

200 characters seems just as arbitrary :)  But arguing over 100 extra characters stored per message is probably not a fight worth pursuing.  So I don't have a strong opinion on raising to 200 or max length as 256.  Although 256 isn't all that special since we are talking UTF-8 characters, not bytes, so landing on a power of 2 doesn't seem that necessary (just say 250 characters?).

MAXLEN seems like overkill though.  If a server is buffering output, we're not going to save that many network packets if a client requests partial text, since the use-case of preview is only to load data for the mailbox viewport.  And it adds more complexity on server side that doesn't seem to give us much benefit.

If we really wanted some sort of maxlen, for consistency sake it probably makes more sense to implement as <<partial>> syntax anyway.

michael