Re: [MORG] POP3 LIST+ Extension: Monitoring for mailbox changes

Steffen Lehmann <lehmann@strato-rz.de> Sat, 05 February 2011 21:10 UTC

Return-Path: <lehmann@strato-rz.de>
X-Original-To: morg@core3.amsl.com
Delivered-To: morg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BC5C53A6C1A for <morg@core3.amsl.com>; Sat, 5 Feb 2011 13:10:50 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.95
X-Spam-Level:
X-Spam-Status: No, score=0.95 tagged_above=-999 required=5 tests=[HELO_EQ_DE=0.35, J_CHICKENPOX_43=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HqMl7MNFIoEY for <morg@core3.amsl.com>; Sat, 5 Feb 2011 13:10:49 -0800 (PST)
Received: from fredda.webgods.de (fredda.webgods.de [192.166.196.83]) by core3.amsl.com (Postfix) with ESMTP id 0EF103A6C13 for <morg@ietf.org>; Sat, 5 Feb 2011 13:10:45 -0800 (PST)
Message-ID: <4D4DBCD3.6000307@strato-rz.de>
Date: Sat, 05 Feb 2011 22:10:43 +0100
From: Steffen Lehmann <lehmann@strato-rz.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Timo Sirainen <tss@iki.fi>
References: <4D4ACA2A.3080401@strato-rz.de> <1296844981.18488.385.camel@hurina>
In-Reply-To: <1296844981.18488.385.camel@hurina>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: morg@ietf.org
Subject: Re: [MORG] POP3 LIST+ Extension: Monitoring for mailbox changes
X-BeenThere: morg@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Messaging Organization <morg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/morg>
List-Post: <mailto:morg@ietf.org>
List-Help: <mailto:morg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/morg>, <mailto:morg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 05 Feb 2011 21:10:50 -0000

Hello Timo,

>> POP3 clients using the LIST command, and,
>>    immediately followed, the UIDL command to get the Message Number,
>>    Message Size and Unique-ID, of all messages of a mailbox, to see if
>>    new messages have arrived.  On large mailboxes, holding hundreds or
>>    thousands of mails, this causes serious resource consumption on the
>>    server, and will prolong the duration a mailbox is hold in locked
>>    state.  This situation is even stressed by the fact that POP3 clients
>>    will usually poll a mailbox for new messages periodically in short
>>    intervals.
> 
> I agree with the above, but
> 
>> Using the LIST+ Extension, a server has to scan through the mailbox
>>    only for a single time, and can send the Message Number, Message Size
>>    and Unique-ID in a more compact form to the client (as shown in
>>    Appendix A.2.1).
> 
> I don't see this helping all that much with the server's resource
> consumption or even much with the client's bandwidth consumption. The
> server still has to look up all the sizes and UIDLs. In Appendix A the
> client downloads 22% less.

To scan through all messages of a mailbox can be a time/CPU/memory
consuming task for a server on big mailboxes. Servers do this usually in
a loop. This could prevent the server to serve other client connections,
while running in this loop (it strongly depends of the server
implementation). Using LIST+, the server has to perform only one loop
over the mailbox instead of two, to provide the same information.

On big mailboxes, polled via slow connections (like Modem, ISDN/E1/BRI,
Mobile/GSM/GPRS, Packet Radio, etc.), it could be a "lot of wood" if the
server would save 22% output.

When I say "big" mailboxes, I speak of mailboxes with about 50.000
messages. In my company, a large ISP provider in Germany, there are
several customers having mailboxes as big as this.


> Most of the time POP3 clients just poll an unchanged mailbox. It would
> be possible to get a much more dramatic bandwidth improvement by being
> able to tell clients that nothing had changed since the last time.
> Here's an example of how it could work:
> 
> Client connecting for the first time:
> 
>    C: LIST +UIDL
>    S: +OK 14758f1afd44c09b7992073ccf00b43d 3 messages, listing follows:
>    S: 1 4536 7abcb6f4da22080e121f2824aef2be9a
>    S: 2 4422 577151e65ea4e62bb9b1c5830a818fe7
>    S: 3 3828 2fe542ac6d1c29bb6a5161558f527e2e
>    S: .
> 
> Client connecting the second time when there are no changes:
> 
>    C: LIST +UIDL +ID=14758f1afd44c09b7992073ccf00b43d
>    S: +OK 14758f1afd44c09b7992073ccf00b43d No changes
>    S: .
> 
> Client connecting the 3rd time when there are changes:
> 
>    C: LIST +UIDL +ID=14758f1afd44c09b7992073ccf00b43d
>    S: +OK ddd2cae8a4a5ec890737d07adb537b99 4 messages, listing follows:
>    S: 1 4536 7abcb6f4da22080e121f2824aef2be9a
>    S: 2 4422 577151e65ea4e62bb9b1c5830a818fe7
>    S: 3 3828 2fe542ac6d1c29bb6a5161558f527e2e
>    S: 4 3284 86519edbbe78d203c5c9618df50b21a7
>    S: .
> 
> So, the client would just remember the value after the "OK". The server
> can use whatever value it wants there, but one simple possibility would
> be to just use a hash(all UIDLs).
> 
> This could be made to save more bandwidth by allowing server to send
> only changes based on the given ID, such as:
> 
>    C: LIST +UIDL +ID=14758f1afd44c09b7992073ccf00b43d
>    S: +OK ddd2cae8a4a5ec890737d07adb537b99 1 new message:
>    S: 4 3284 86519edbbe78d203c5c9618df50b21a7
>    S: .
> 
> Deleted messages could also be reported by adding more complexity.
> 

This is an interesting idea.
But doing it using the ID as proposed raises a couple problems:

Hashes are no unique value.

If this ID is a hash as proposed, the server has to compare its value
with that hash it is just generating inside the loop on every line.
Unfortunately, hash functions must be "finalized" to get the hash
function output. If the hash does't match, the server hash to start the
hash function again, and has to put in all data from the beginning up to
the current line into the hash function again. This would be overkill,
and makes using of a hash impractical for this purpose.

A better way would be, the client tells the sever the tuple of
message-number/UIDL of the very last message it knows. The server can
easily compare this two values with the message-number/UIDL tuple of
every message inside the loop, i.e.:

Client connecting for the first time:
   C: LIST +UIDL +AGE +UNSEEN
   S: +OK
   S: 1 4536 7abcb6f4da22080e121f2824aef2be9a 32 16
   S: 2 4422 577151e65ea4e62bb9b1c5830a818fe7 30 0
   S: 3 3828 2fe542ac6d1c29bb6a5161558f527e2e 0 0
   S: .
The server has to send all messages, because of no message-number/UIDL
tuple was given. The client does also fetch the AGE and UNSEEN values
and stores it (these values are in units of days, so the client needs to
fetch them ony one time per day).

Client connecting the second time when there are no changes.
Client sends UIDL with a message-number/UIDL tuple of the last message:
   C: LIST +UIDL=3/2fe542ac6d1c29bb6a5161558f527e2e
   S: +OK
   S: 3 3828 2fe542ac6d1c29bb6a5161558f527e2e
   S: .
In this case, the server only sends the last message. The client can
easily determine that this message is already known, and ignore it.
Sending a non-empty list is necessary to indicate that there are still
messages in the mailbox, an empty list would indicate an empty mailbox.

Client connecting the 3rd time when a new message arrived,
(but no other changes):
   C: LIST +UIDL=3/2fe542ac6d1c29bb6a5161558f527e2e
   S: +OK
   S: 4 3284 86519edbbe78d203c5c9618df50b21a7
   S: .
Only the new message(s) will be listed.
4/86519edbbe78d203c5c9618df50b21a7 becomes the new tuple.

Client connecting the 4th time when message 3 was deleted
by another client:
   C: LIST +UIDL=4/86519edbbe78d203c5c9618df50b21a7
   S: +OK
   S: 1 4536 7abcb6f4da22080e121f2824aef2be9a
   S: 2 4422 577151e65ea4e62bb9b1c5830a818fe7
   S: 3 3284 86519edbbe78d203c5c9618df50b21a7
   S: .
Former message 4 becomes message 3. The sever sends a complete list.
3/86519edbbe78d203c5c9618df50b21a7 becomes the new tuple.

Client connecting the 5th time when all messages were deleted
by another client:
   C: LIST +UIDL=3/86519edbbe78d203c5c9618df50b21a7
   S: +OK
   S: .
The server sends a complete list, which is empty in this case.

The following rules apply:
1. The server has to send a complete message listing in following cases:
- LIST command without +UIDL Flag,
- +UIDL Flag without parameter
- +UIDL Flag with parameter, but the server cannot find a message with
matching message-number/UIDL tuple. This happens if messages on the
server were deleted, resulting in renumbering of all messages above the
deleted message. Renumbering the messages invalidates the
message-number/UIDL tuples of all messages with message-numbers above
the delete message.

2. When client's message-number/UIDL tuple matches the one of server's
last message, the server just issues the last message. This indicates
that there are no changes in the mailbox.

3. If client's message-number/UIDL tuple matches with the one of any
message on the server, and this is not the last message, the server
issues all messages having message-numbers greater than the
message-number as indicated in the tuple (which are, in fact, all newly
arrived messages).

4. If the client receives a message listing whose first message-number
is one, or if the client receives an empty message listing, it MUST
consider this message listing as a "complete" message listing. All
messages known by the server are included in this message listing.

5. If the client receives a message listing whose first message-number
is greater than one, it MUST consider this message listing as a
"partial" message listing. The server should keep knowledge of all
messages it already knows, and should add knowledge of all messages it
does not already know.

Note:
The term "changes in a mailbox" means either message deletion or message
delivery. Changes in the values of the +AGE and +UNREAD Flags, or any
other meta data, cannot be monitored in this way.

Monitoring of message deletion would require a more sophisticated
approach. It has to create a context on the server, having an unique ID,
where the server can store information about the state of each client.
In IMAP, this state is represented by the client connection, but as with
POP3, clients usually disconnect immediately after information
retrieval, so the client connection cannot be used for this purpose.
IMHO, this would things make too complex. Because most of the time POP3
clients just poll an unchanged mailbox, the mechanism as proposed above
has already enough potential to reduce bandwith drastically.


>> The +UNREAD Flag requests to add the number of whole days, since the
>>    message was read for last time, to the extended scan listing.
> 
> What exactly is the "last read time" for a message? RETR time? When it
> was marked as \Seen in webmail/IMAP?
> 

I want to rename the +UNREAD Flag to +LASTREAD, and introduce another
Flag +FIRSTREAD. This offers more flexibility for POP3 client
developers, when they want to use these values as input for different
kinds of automatic message deletion processes.
The meaning would be:
  +FIRSTREAD: The time when the message was read for very first time
(when i.e. IMAP sets the \Seen flag)
  +LASTREAD: The time when the message was read the very last time.
In POP3, a message is to be considered as "read" after a successful RETR
command has been issued for that message. Reading a message with the TOP
command shall not mark a message as "read", even if all body lines were
read.

Steffen