Re: [apps-discuss] Alissa Cooper's Discuss on draft-ietf-appsawg-sieve-duplicate-07: (with DISCUSS and COMMENT)

Ned Freed <ned.freed@mrochek.com> Wed, 25 June 2014 00:56 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EBA781B2A01; Tue, 24 Jun 2014 17:56:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.047
X-Spam-Level:
X-Spam-Status: No, score=0.047 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.651, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z9a_Ha0rQKvH; Tue, 24 Jun 2014 17:56:05 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.159.242.17]) by ietfa.amsl.com (Postfix) with ESMTP id 315271B291A; Tue, 24 Jun 2014 17:56:05 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01P9EFB11Q740061K2@mauve.mrochek.com>; Tue, 24 Jun 2014 17:50:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mrochek.com; s=mauve; t=1403657450; bh=yNbmoYSGXC9U1n0CMqAzmd6baevKS3Sp8zuYBlsAHIU=; h=Cc:Date:From:Subject:In-reply-to:References:To; b=qBP67Zz4PqaU2Lh+ayh6jowf2hW7UFQ3Rs+A2liVgzzp2zcRdBZdXdVotUU00AWMc 812T+wfmU/W5CGvqSiHtgKOAp5x5Pfg694TGnLIDzVnSiF6dYs5FA2pXMekXByh5Yb xXzgY4VMMzmnFgw9Nz8LEHKfzfTOBscQyj1F/2nI=
MIME-version: 1.0
Content-transfer-encoding: 8bit
Content-type: TEXT/PLAIN; charset="utf-8"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01P8UD4AOU8W0049PU@mauve.mrochek.com>; Tue, 24 Jun 2014 17:50:46 -0700 (PDT)
Message-id: <01P9EFAYDH680049PU@mauve.mrochek.com>
Date: Tue, 24 Jun 2014 15:32:18 -0700
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Tue, 24 Jun 2014 23:01:42 +0200" <53A9E736.9080709@rename-it.nl>
References: <20140620004041.5801.22430.idtracker@ietfa.amsl.com> <53A3E7EB.1030604@rename-it.nl> <CFCDF85C.42C1C%alissa@cooperw.in> <53A9E736.9080709@rename-it.nl>
To: Stephan Bosch <stephan@rename-it.nl>
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/7SOdDmDISflQI0xdeGPPb0Lt6wE
Cc: apps-discuss@ietf.org, draft-ietf-appsawg-sieve-duplicate@tools.ietf.org, appsawg-chairs@tools.ietf.org, The IESG <iesg@ietf.org>, ned+ietf@mrochek.com
Subject: Re: [apps-discuss] Alissa Cooper's Discuss on draft-ietf-appsawg-sieve-duplicate-07: (with DISCUSS and COMMENT)
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Jun 2014 00:56:08 -0000

> >>> The suggested default seems really long, especially for the example use
> >>> case described in Section 1.

Am I missing something? The first example is a straightfoward one where
message-id duplicates are filed to a special "Trash/Duplicate" folder. In such
a case basically the longer the retention interval the better things work.
Given the way message-ids are typically generated by most clients it's not like
the longer interval increases the liklihood of a duplicate occurring.
 
If the longer retention period has a downside for the user I am not seeing it.

So whether or not a 7 day retention is problematic really depends on the server
environment. If you're running a server for relatively small number of users,
i.e.., a small to medium sized enterprises, schools, small to medium sized
ISPs, etc. given the high liklihood that not everyone will opt to use this, and
given the power and capabilities of modern server hardware, I seriously doubt
that a 7 day default will cause any problems. An single instance of memcached
can handle an awful lot of entries of this type.

Of course a large ISP/MSP with users numbering in the many millions is going to
face a very different set of tradeoffs. But I have a sneaking suspicion that in
the unfortunately unlikely event that such a concern would bother to offer this
capability, they are capable of picking appropriate defaults, which the
document certainly allows.

And more generaly, it's not the the necessary expertise to size database back
ends of various sorts is hard to come by.

> >>> On the other hand, it seems odd that the
> >>> user's choice would be overriden by the preset maximum.

I don't know if anyone else has pointed this out, but that's not quite how it
works. The user's choice is only overridden if it exceeds the maximum. And
reason for that is simple: All servers need to prevent users from creating
entries that last for absurdly long periods, and in cases where the number
of retained entries is a potential problem, to prevent users from overloading
the system.

> >>> This would make
> >>> more sense to me if the default expiration were shorter and the user
> >>> could override it with a longer :seconds argument if he wanted. What is
> >>> the rationale for doing it the opposite way?

Again, that's not how it works. There are two values involved: A default
and a maximum. The default is what's used when no value is specified, whereas
is, well, the maximum. And the draft 

I don't really think the text is unclear about the distinction between defaults
and maximums, but if it is then it definitely needs to be changed.

And FWIW, my implementation of this also implements a minimum that is silently
substituted if a user goes under the value. My rationale for having this is
so something sensible is done when users screw up and enter absurdly low
values, then complain when their problem isn't solved. But I don't think
this is sufficiently useful that it belongs in the standard.

> >> The default was chosen in Sieve mailing list discussions, but in essence
> >> it is pretty arbitrary. It is mainly based on the period in which a
> >> series of duplicate messages may arrive with a margin of a few more days.
> > I find it a bit surprising that receipt of a duplicate multiple days after
> > the first message is received is that commonplace on today’s Internet. Is
> > there any data around to back that up?

> Arnt mentioned one example. I am no expert on this; I couldn't tell you.
> If I remember correctly, Alexey Melnikov proposed the default of seven days.

It's fairly uncommon for there to be significant delays, but that is largely
irrelevant. The point it is does happen, and when it does and duplicates get
through, users who have done to the trouble of setting this up will see them
and complain. And since email is often seen as a cost, not a value-added
service, trading off a bit more hardware - even if it comes to that - for fewer
support calls is not a bad idea.

> The "vacation" extension specification [RFC5230] mentions retention
> periods with similar magnitude (for ":days" argument).

And for similar reasons: Getting multiple vacation replies is fairly
irritating.

> >> The rationale for a maximum is to prevent users from having the ability
> >> to create duplicate tracking list entries that linger indefinitely.
> > Well, “indefinitely” wouldn’t necessarily be possible, because if the user
> > specifies nothing then it’s the default, and otherwise he could be
> > required to specify _something_, correct?

> True, but that something could be e.g. 2**32, which is pretty close to
> indefinitely.

I confess I don't understand this at all.

> > But I can appreciate that you don’t want to impose a long, arbitrary
> > retention requirement. Is the idea that the maximum would be surfaced
> > to the user if he’s using some GUI  create a duplicate filter?

> My implementation can provide a warning when this situation occurs. In
> the case of a GUI this would be returned through ManageSieve with the
> WARNINGS response code; http://tools.ietf.org/html/rfc5804#section-1.3 .
> I am not sure whether other implementations would do this or whether
> GUIs actually show these warnings, but at least there is a specification
> of this facility.

That's not a bad idea, although I note that nothing says the maximum that's
in effect when the sieve is created is going to be the same as the one
in effect when the sieve is executed.

> I could add text that implementations SHOULD issue a warning when a
> maximum ":seconds" value is substituted over what the user specified.

I don't have a problem with that as long as it's specific to managesieve.
There's really no straightforward viable path for reporting warnings at sieve
execution time. (The closest thing we have to that would be the addition of a
header, but the odds of anyone seeing it are so low there's really no point.)
And I don't think this extension is the right place to get into the issue of
hawo to provide warnings from sieve at executiion 

OTOH, I think we may have crossed the line into overdesign here.

				Ned