Re: [apps-discuss] Working Group Last Call draft-ietf-appsawg-sieve-duplicate

t.petch <ietfc@btconnect.com> Sat, 11 January 2014 12:49 UTC

Return-Path: <ietfc@btconnect.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2E3DB1AD8E2 for <apps-discuss@ietfa.amsl.com>; Sat, 11 Jan 2014 04:49:33 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.3
X-Spam-Level:
X-Spam-Status: No, score=-1.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tq2DRJ9TX9iz for <apps-discuss@ietfa.amsl.com>; Sat, 11 Jan 2014 04:49:30 -0800 (PST)
Received: from db8outboundpool.messaging.microsoft.com (mail-db8lp0188.outbound.messaging.microsoft.com [213.199.154.188]) by ietfa.amsl.com (Postfix) with ESMTP id A48BC1ADE86 for <apps-discuss@ietf.org>; Sat, 11 Jan 2014 04:49:29 -0800 (PST)
Received: from mail158-db8-R.bigfish.com (10.174.8.242) by DB8EHSOBE035.bigfish.com (10.174.4.98) with Microsoft SMTP Server id 14.1.225.22; Sat, 11 Jan 2014 12:49:18 +0000
Received: from mail158-db8 (localhost [127.0.0.1]) by mail158-db8-R.bigfish.com (Postfix) with ESMTP id 71DFEA03A1; Sat, 11 Jan 2014 12:49:18 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.249.213; KIP:(null); UIP:(null); IPV:NLI; H:AM2PRD0710HT002.eurprd07.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -17
X-BigFish: PS-17(zzbb2dI98dI9371I542I1432I1418Izz1f42h2148h208ch1ee6h1de0h1fdah2073h2146h1202h1e76h20f7h2189h1d1ah1d2ah1fc6hzz1de098h1033IL8275bh8275dh1de097hz2dh2a8h5a9h839h947hd24hf0ah1177h1179h1288h12a5h12a9h12bdh137ah139eh13b6h1441h1504h1537h162dh1631h1758h17f1h184fh1898h18e1h1946h19b5h19ceh1ad9h1b0ah2222h224fh1d0ch1d2eh1d3fh1dfeh1dffh1e1dh1e23h2218h2216h226dh22d0h2327h2336h2438h2461h304l1d11m1155h)
X-Forefront-Antispam-Report-Untrusted: SFV:NSPM; SFS:(10009001)(479174003)(199002)(13464003)(51444003)(189002)(51704005)(377454003)(24454002)(85306002)(83322001)(80976001)(19580395003)(19580405001)(88136002)(87286001)(87266001)(87976001)(83072002)(89996001)(85852003)(92566001)(90146001)(56816005)(92726001)(31966008)(44736004)(46102001)(53806001)(51856001)(50986001)(47976001)(49866001)(47736001)(74502001)(93136001)(47446002)(74662001)(50226001)(4396001)(42186004)(76482001)(77156001)(61296002)(74366001)(77096001)(62966002)(54316002)(59766001)(77982001)(81542001)(63696002)(66066001)(56776001)(65816001)(74876001)(80022001)(81342001)(69226001)(47776003)(84392001)(23756003)(14496001)(74706001)(50466002)(79102001)(62236002)(44716002)(76796001)(76786001)(33646001)(74416001)(7726001); DIR:OUT; SFP:1101; SCL:1; SRVR:AMXPR07MB053; H:DBXPRD0611HT002.eurprd06.prod.outlook.com; CLIP:157.56.254.85; FPR:; RD:InfoNoRecords; A:0; MX:1; LANG:en;
Received: from mail158-db8 (localhost.localdomain [127.0.0.1]) by mail158-db8 (MessageSwitch) id 1389444556153450_26698; Sat, 11 Jan 2014 12:49:16 +0000 (UTC)
Received: from DB8EHSMHS032.bigfish.com (unknown [10.174.8.235]) by mail158-db8.bigfish.com (Postfix) with ESMTP id 1F6B94C004A; Sat, 11 Jan 2014 12:49:16 +0000 (UTC)
Received: from AM2PRD0710HT002.eurprd07.prod.outlook.com (157.56.249.213) by DB8EHSMHS032.bigfish.com (10.174.4.42) with Microsoft SMTP Server (TLS) id 14.16.227.3; Sat, 11 Jan 2014 12:49:15 +0000
Received: from AMXPR07MB053.eurprd07.prod.outlook.com (10.242.67.142) by AM2PRD0710HT002.eurprd07.prod.outlook.com (10.255.165.37) with Microsoft SMTP Server (TLS) id 14.16.395.1; Sat, 11 Jan 2014 12:49:15 +0000
Received: from DBXPRD0611HT002.eurprd06.prod.outlook.com (157.56.254.85) by AMXPR07MB053.eurprd07.prod.outlook.com (10.242.67.142) with Microsoft SMTP Server (TLS) id 15.0.851.11; Sat, 11 Jan 2014 12:49:13 +0000
Message-ID: <005501cf0eca$faf86840$4001a8c0@gateway.2wire.net>
From: "t.petch" <ietfc@btconnect.com>
To: Stephan Bosch <stephan@rename-it.nl>, "Murray S. Kucherawy" <superuser@gmail.com>, IETF Apps Discuss <apps-discuss@ietf.org>
References: <CAL0qLwZqJPTssNVLLaSjAP5wqteZ==fuawNF+WUZYvi+YWV1UQ@mail.gmail.com> <00a301cf07e8$01352160$4001a8c0@gateway.2wire.net> <52CF384D.3080502@rename-it.nl>
Date: Sat, 11 Jan 2014 12:45:07 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Originating-IP: [157.56.254.85]
X-ClientProxiedBy: DB4PR07CA013.eurprd07.prod.outlook.com (10.242.229.23) To AMXPR07MB053.eurprd07.prod.outlook.com (10.242.67.142)
X-Forefront-PRVS: 0088C92887
X-OriginatorOrg: btconnect.com
X-FOPE-CONNECTOR: Id%0$Dn%*$RO%0$TLS%0$FQDN%$TlsDn%
Subject: Re: [apps-discuss] Working Group Last Call draft-ietf-appsawg-sieve-duplicate
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Jan 2014 12:49:33 -0000

Mostly inline, but my big comment which gets lost in the detail is about
scope of caches.  You say
"there is only one 'cache' (in the document it is called the duplicate
tracking list)"

Really, worldwide, like the DNS root?-)

Hitherto, as I understand it, sieve has had no state; a filter is
applied to a stream of messages and that is it, forget it and start
again.  This adds state.  This then introduces the question of scope.
If I get the same message from two different ISP (which I  do), will
that be detected?  I expect not.  When those ISP outsource their MX to
the same third party, will that be detected?  I expect not as long as
they have different mail domains.  At what point is there a single
cache?  When I used multiple mailboxes for the same mail domain?  Only
when I use a specific  mailbox? or when?

In a sense it does not matter because it only effects the user
experience and not the protocol on the wire, but I expect that users
will care if the implementation on ISP A gives totally different results
to ISP B.  And of course, when there is more than one cache (worldwide
:-), then operators of this facility will need more resources to
maintain them.

Tom Petch

----- Original Message -----
From: "Stephan Bosch" <stephan@rename-it.nl>
To: "t.petch" <ietfc@btconnect.com>; "Murray S. Kucherawy"
<superuser@gmail.com>; "IETF Apps Discuss" <apps-discuss@ietf.org>
Sent: Friday, January 10, 2014 12:01 AM
> On 1/2/2014 7:24 PM, t.petch wrote:
> > This I-D seems to need more thought.
> >
> > s.1 "For example, if a member of the list decides to
> >    reply to both the user and the mailing list itself, the user will
> >    one copy of the message directly and another through the mailing
> >    list.
> >
> > Well, they MAY, but they don't on a good list system, such as the
one in
> > use here.
> >
> > "Also, if someone cross-posts over several mailing lists to
> >    which the user is subscribed, the user will receive a copy from
each
> >    of those lists."
> >
> > Ditto, not here.
>
> Ok, but these situations are quite common. I've implemented an earlier
> version of this extension based on user requests. So, what do you want
> me to do? Word this differently so that it is clear that this
shouldn't
> happen for sanely configured mailing lists?

I found the 'will' too forceful; my instant reaction was 'no it won't'
because that is my experience.  Just moderate it slightly, 'will often'
'may' 'commonly'-  just not a somewhat forceful 'will'

> > "   Duplicate messages are normally detected using the Message-ID
header
> >    field, which is required to be unique for each message.  "
> >
> > REQUIRED maybe, but I seem to recall the malformed-mail I-D raising
the
> > possibility that it was not.  In which case, ...?
>
> I'm not sure how common that is, but you are right:  as the
> specification is now, that would cause a false positive. We can make
the
> default a bit more complex by combining the Message-ID  with some
other
> header (Date perhaps?), thereby further reducing the likelihood of a
> false positive. I guess we need to think about that a little more.

Yes, I think you should allow for the possibility in the I-D - as to
how, I am less fussed.  Could be 'outside the scope of' up to 'should
detect duplicates and discard as malformed' - just show that it has been
considered.

> > s.3
> > "an earlier Sieve execution."
> > reading on it is apparent that this is any number of executions
limited
> > by the size of the FIFO cache and the maximum lifetime of entries in
the
> > cache.
>
> Yes. So, what exactly is your comment here? I don't think it is useful
> to mention such detail early in the description.
>
> > "   Usage:  [":header" <header-name: string> /
> >                           ":uniqueid" <value: string>]
> >
> > Why have two way of doing the same thing?  As I read it, this test
is on
> > a header field, so why not have just ":header" with a default of
message
> > I-D?
>
> I am not sure what you mean here. The :uniqueid argument does
explicitly
> not operate (directly) on a message header, but rather on some string
> value composed by the user (using the variables extension). This can
> consist of header field contents, but also on message body or even
some
> source other than the message being delivered.

If I understand it aright, the test for duplication can be on
 - message id
 - header field
 - something else
which are invoked by <nothing>, :header, :unique-id respectively.
Since message id is just another header field, why not merge the first
two with the message id as the default header field if none other is
specified?

> > And what happens if I use header field X in one execution and then
> > header field Y in another? I presume separate caches for X and Y, in
> > which case, duplicates may not be detected.
>
> No, there is only one 'cache' (in the document it is called the
> duplicate tracking list). See the following text:
>
>  The "duplicate" test MUST track an unique ID value independent of its
>  source.  This means that it does not matter whether values are
>  obtained from the message ID header, from an arbitrary header
>  specified using the ":header" argument or explicitly from the
>  ":uniqueid" argument.

see above


> Some examples follow this text. Do you mean that this needs to be
> clarified more?
>
> > The use of multiple fields
> > opens up all sorts of complications that need more explanation
depending
> > on the concept of the scope of the operation, which I do not see
clearly
> > explained.
>
> I am not sure what you mean here. I am assuming this comment is not
> relevant given the above. Please clarify otherwise.
>
> > "The user can explicitly control the
> >    length of this expiration time by means of the ":seconds"
argument,
> >    which is always specified in seconds.  "
> >
> > seconds seems short to me.  On the IETF lists, I typically see a gap
of
> > several hours between a message on one list and a message on another
> > list, with four hours being the norm.  I would regard 5 minutes as
the
> > minimum and 36 hours, or perhaps less, as the maximum.
>
> Given the vacation-seconds extension, the use of a seconds granularity
> is not strange in the Sieve realm.
>
> I like the flexibility of using seconds (mailinglists are not the only
> application area of this extension), but I am not against changing it
to
> :minutes per se. Do any other people have thoughts?

Ok but I just got a duplicate, once via the ietf announce list, once via
the ietf main list, and they were 10 hours apart.  For me, this is
typical, hours not seconds.

> > "By adding the ":header" argument with a message header
> >    field name, the content of the specified header field can be used
as
> > "
> >
> > Does this apply to all headers? I assume it does, since if message
I-D
> > is not used, then it would be an X- proprietary one that would seem
to
> > me to be the next best thing.
>
> Any header can be selected with this argument.
>
> > "   If the tracked unique ID value is extracted directly from a
message
> >    header field, i.e., when the ":uniqueid" argument is not used,"
> >
> > you are saying that when uniqueid is not used, then a unique ID is
used.
> > I think that this will cause confusion here and elsewhere - you need
> > another term than 'unique ID' as the collective noun for the
identifying
> > string you are using in the equality comparison.
>
> Hmm, yeah. Any suggestions? :)

No good ones:-)  'identifier' ' message identifier'  just something that
is visually different from unique-id so preferably not incorporating
'unique'.

> > "leading and trailing whitespace MUST first be trimmed from the
value"
> >
> > This is a can of worms.  Normalisation often appears on these lists
> > without, usually, a satisfactory answer, let alone the issues of
i18n.
> > More needs to be considered here.
>
> This mainly serves as a means to prevent stray white space from
messing
> with the string match. The core Sieve language also does this for
> instance for the header test. And how would i18n be relevant here?

Read RFC6532; that you have not referenced it makes me think that you
have not considered i18n which I would regard as remiss nowadays.

> Regards,
>
> Stephan.