Re: [apps-discuss] Working Group Last Call draft-ietf-appsawg-sieve-duplicate

Stephan Bosch <stephan@rename-it.nl> Fri, 10 January 2014 00:02 UTC

Return-Path: <stephan@rename-it.nl>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2D5611AD66E for <apps-discuss@ietfa.amsl.com>; Thu, 9 Jan 2014 16:02:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.257
X-Spam-Level:
X-Spam-Status: No, score=0.257 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HELO_EQ_NL=0.55, HOST_EQ_NL=1.545, J_CHICKENPOX_15=0.6, RP_MATCHES_RCVD=-0.538] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 82obs58AcDTq for <apps-discuss@ietfa.amsl.com>; Thu, 9 Jan 2014 16:02:19 -0800 (PST)
Received: from drpepper.rename-it.nl (drpepper.rename-it.nl [217.119.238.16]) by ietfa.amsl.com (Postfix) with ESMTP id 3743A1AD34C for <apps-discuss@ietf.org>; Thu, 9 Jan 2014 16:02:18 -0800 (PST)
Received: from klara.student.utwente.nl ([130.89.162.218]:54175 helo=[10.168.3.2]) by drpepper.rename-it.nl with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.72) (envelope-from <stephan@rename-it.nl>) id 1W1PY1-00087v-V1; Fri, 10 Jan 2014 01:02:04 +0100
Message-ID: <52CF384D.3080502@rename-it.nl>
Date: Fri, 10 Jan 2014 01:01:17 +0100
From: Stephan Bosch <stephan@rename-it.nl>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: "t.petch" <ietfc@btconnect.com>, "Murray S. Kucherawy" <superuser@gmail.com>, IETF Apps Discuss <apps-discuss@ietf.org>
References: <CAL0qLwZqJPTssNVLLaSjAP5wqteZ==fuawNF+WUZYvi+YWV1UQ@mail.gmail.com> <00a301cf07e8$01352160$4001a8c0@gateway.2wire.net>
In-Reply-To: <00a301cf07e8$01352160$4001a8c0@gateway.2wire.net>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-RenameIT-MailScanner-SpamScore: -2.3 (--)
X-RenameIT-MailScanner-SpamCheck: No, score=-2.3 required=5.0 tests=ALL_TRUSTED, BAYES_00 autolearn=ham version=3.3.1
Subject: Re: [apps-discuss] Working Group Last Call draft-ietf-appsawg-sieve-duplicate
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Jan 2014 00:02:23 -0000

Hi Tom,

First of all, thanks for your review.  :)

On 1/2/2014 7:24 PM, t.petch wrote:
> This I-D seems to need more thought.
>
> s.1 "For example, if a member of the list decides to
>    reply to both the user and the mailing list itself, the user will
>    one copy of the message directly and another through the mailing
>    list.
>
> Well, they MAY, but they don't on a good list system, such as the one in
> use here.
>
> "Also, if someone cross-posts over several mailing lists to
>    which the user is subscribed, the user will receive a copy from each
>    of those lists."
>
> Ditto, not here.

Ok, but these situations are quite common. I've implemented an earlier
version of this extension based on user requests. So, what do you want
me to do? Word this differently so that it is clear that this shouldn't
happen for sanely configured mailing lists?

> "   Duplicate messages are normally detected using the Message-ID header
>    field, which is required to be unique for each message.  "
>
> REQUIRED maybe, but I seem to recall the malformed-mail I-D raising the
> possibility that it was not.  In which case, ...?

I'm not sure how common that is, but you are right:  as the
specification is now, that would cause a false positive. We can make the
default a bit more complex by combining the Message-ID  with some other
header (Date perhaps?), thereby further reducing the likelihood of a
false positive. I guess we need to think about that a little more.

> s.3
> "an earlier Sieve execution."
> reading on it is apparent that this is any number of executions limited
> by the size of the FIFO cache and the maximum lifetime of entries in the
> cache.

Yes. So, what exactly is your comment here? I don't think it is useful
to mention such detail early in the description.

> "   Usage:  [":header" <header-name: string> /
>                           ":uniqueid" <value: string>]
>
> Why have two way of doing the same thing?  As I read it, this test is on
> a header field, so why not have just ":header" with a default of message
> I-D?

I am not sure what you mean here. The :uniqueid argument does explicitly
not operate (directly) on a message header, but rather on some string
value composed by the user (using the variables extension). This can
consist of header field contents, but also on message body or even some
source other than the message being delivered.

> And what happens if I use header field X in one execution and then
> header field Y in another? I presume separate caches for X and Y, in
> which case, duplicates may not be detected.

No, there is only one 'cache' (in the document it is called the
duplicate tracking list). See the following text:

 The "duplicate" test MUST track an unique ID value independent of its
 source.  This means that it does not matter whether values are
 obtained from the message ID header, from an arbitrary header
 specified using the ":header" argument or explicitly from the
 ":uniqueid" argument. 

Some examples follow this text. Do you mean that this needs to be
clarified more?

> The use of multiple fields
> opens up all sorts of complications that need more explanation depending
> on the concept of the scope of the operation, which I do not see clearly
> explained.

I am not sure what you mean here. I am assuming this comment is not
relevant given the above. Please clarify otherwise.

> "The user can explicitly control the
>    length of this expiration time by means of the ":seconds" argument,
>    which is always specified in seconds.  "
>
> seconds seems short to me.  On the IETF lists, I typically see a gap of
> several hours between a message on one list and a message on another
> list, with four hours being the norm.  I would regard 5 minutes as the
> minimum and 36 hours, or perhaps less, as the maximum.

Given the vacation-seconds extension, the use of a seconds granularity
is not strange in the Sieve realm.

I like the flexibility of using seconds (mailinglists are not the only
application area of this extension), but I am not against changing it to
:minutes per se. Do any other people have thoughts?

> "By adding the ":header" argument with a message header
>    field name, the content of the specified header field can be used as
> "
>
> Does this apply to all headers? I assume it does, since if message I-D
> is not used, then it would be an X- proprietary one that would seem to
> me to be the next best thing.

Any header can be selected with this argument.

> "   If the tracked unique ID value is extracted directly from a message
>    header field, i.e., when the ":uniqueid" argument is not used,"
>
> you are saying that when uniqueid is not used, then a unique ID is used.
> I think that this will cause confusion here and elsewhere - you need
> another term than 'unique ID' as the collective noun for the identifying
> string you are using in the equality comparison.

Hmm, yeah. Any suggestions? :)

> "leading and trailing whitespace MUST first be trimmed from the value"
>
> This is a can of worms.  Normalisation often appears on these lists
> without, usually, a satisfactory answer, let alone the issues of i18n.
> More needs to be considered here.

This mainly serves as a means to prevent stray white space from messing
with the string match. The core Sieve language also does this for
instance for the header test. And how would i18n be relevant here?

Regards,

Stephan.