Re: [dmarc-ietf] I-D Action: draft-ietf-dmarc-dmarcbis-03.txt

Alessandro Vesely <vesely@tana.it> Thu, 19 August 2021 18:36 UTC

Return-Path: <vesely@tana.it>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A430C3A14F5 for <dmarc@ietfa.amsl.com>; Thu, 19 Aug 2021 11:36:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1152-bit key) header.d=tana.it
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WjPlG7Z6clHQ for <dmarc@ietfa.amsl.com>; Thu, 19 Aug 2021 11:36:18 -0700 (PDT)
Received: from wmail.tana.it (wmail.tana.it [62.94.243.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ECEDC3A14F8 for <dmarc@ietf.org>; Thu, 19 Aug 2021 11:36:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tana.it; s=delta; t=1629398170; bh=OZ86RZ+gX3DivjlrrZK3Tp9XkwogTU2IKuZKDwu3ccY=; l=8667; h=To:References:From:Date:In-Reply-To; b=BN6WX6DKrAzluWoGAfe6XE0AnEjCJc+UmS2ZujToMkcPjJcnu27/5erDDeu1UHBkt /kWRQY++ahA2fAGwmw7U8HmllkOeOaJRxrp69motB1njL9CZevFVvN/1M3TTWrbXd3 3pF+zhDxhwxC33CBpkdaRof8f24zOVDsAezXphZAsJsWNQDtibbt+9U+06SY9
Authentication-Results: tana.it; auth=pass (details omitted)
Original-From: Alessandro Vesely <vesely@tana.it>
Received: from [192.168.1.103] ([2.198.14.132]) (AUTH: CRAM-MD5 uXDGrn@SYT0/k, TLS: TLS1.3, 128bits, ECDHE_RSA_AES_128_GCM_SHA256) by wmail.tana.it with ESMTPSA id 00000000005DC028.00000000611EA49A.00006291; Thu, 19 Aug 2021 20:36:10 +0200
To: dmarc@ietf.org
References: <162931752865.27585.10197515584988072678@ietfa.amsl.com> <CAHej_8mcwKcjwxV09_6ENrOnh5t+seDv_kTZiO0mgyRS2BVgTA@mail.gmail.com> <3e4b2087-a866-6f66-3964-71a3c67eab8b@tana.it> <CAHej_8kVW8daPQhghouneRS37WhaCHo4Os6Ggd43FbOpo=ri6A@mail.gmail.com>
From: Alessandro Vesely <vesely@tana.it>
Message-ID: <53fb93f5-4bf7-2cd7-f889-7299f2d6d7f1@tana.it>
Date: Thu, 19 Aug 2021 20:36:08 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <CAHej_8kVW8daPQhghouneRS37WhaCHo4Os6Ggd43FbOpo=ri6A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/6njfhM1SFb6uP91B3s2mfbsSz1I>
Subject: Re: [dmarc-ietf] I-D Action: draft-ietf-dmarc-dmarcbis-03.txt
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Aug 2021 18:36:24 -0000

On Thu 19/Aug/2021 14:52:41 +0200 Todd Herr wrote:
> On Thu, Aug 19, 2021 at 7:19 AM Alessandro Vesely <vesely@tana.it> wrote:
>> On Wed 18/Aug/2021 22:17:57 +0200 Todd Herr wrote:
>>>
>>> The main update in this draft is removal of the "pct" tag, with an
>>> explanation as to why, and an introduction of the "t" tag in an effort
>>> to maintain the functionality provided today by "pct=0" and "pct=100".
>>
>> As held earlier, I disagree with such gratuitous breaking of the
>> existing installed base and published records.
> 
> I disagree with your characterization of removal of the "pct" tag as
> "gratuitous breaking"; the spec has long contained the following text:
> 
> Only tags defined in this document or in later extensions, and thus
> added to that registry, are to be processed; unknown tags MUST be
> ignored. >
> and so should a DMARC protocol without the "pct" tag be formally adopted,
> there should be no breaking of any existing DMARC implementations.


Excuse me, but I don't get it.  An existing DMARC implementation which 
treats pct=, should a DMARC protocol without it be formally adopted, 
MUST be rewritten, re-tested, re-installed.  IOW, the current 
installation is broken.

As for existing records, they have to work with the updated 
implementations as well as with the ones now formally broken.


>> It goes without saying that domains who are publishing pct=0 will
>> slowly adapt by adding t=y and never removing pct.  Those who publish
>> pct=50 and are satisfied with it will have to give up, despite their
>> own operational experience.
>>
>> In any case, I object to the use of the Probability Mass Function as
>> applied to Binomial Distributions argument.  It presumes that the
>> percentage in question refers to the number of messages sent during a
>> given day, which was never specified.  The spec said "Percentage of
>> messages from the Domain Owner's *mail stream*".  The random function
>> applied to such stream is equivalent to computing a Monte Carlo
>> integration on a finite set.  Since *all samples* are eventually
>> considered, the result tends to the exact value.
> 
> I will continue to contend that the first sentence of the existing
> definition of the "pct" tag was incorrectly worded. It reads:
> 
>     Percentage of messages from the Domain Owner's mail stream to
>     which the DMARC policy is to be applied.
> 
> and as I've written in other threads, such phrasing makes no sense to me,
> because a DMARC policy cannot be applied to a message which passes DMARC
> verification checks.
> 
> I believe my argument to be supported by the existing definitions of
> "quarantine" and "reject", which read in part:
> 
>     quarantine:  The Domain Owner wishes to have email that fails the
>     DMARC mechanism check be treated by Mail Receivers as suspicious.
> 
>     reject:  The Domain Owner wishes for Mail Receivers to reject
>     email that fails the DMARC mechanism check.
> 
> There is nothing in the text there that talks of applying a policy of
> "quarantine" or "reject" to messages that pass the DMARC mechanism check,
> so it follows for me that the "pct" tag was never intended to apply to the
> entire mail stream, only to those messages that failed the check.


Agreed.  The language of RFC 7489 seems to talk about applying the 
DMARC check and then applying the policy if the check fails.  This in 
no way limits the mail stream to a single day.


> Given my
> assertion here, I believe that the Probability Mass Function as applied to
> Binomial Distributions is the correct argument to make against the pct tag,
> and I was rather pleased with myself in how I described it in an earlier
> revision of the spec:
> 
>           if (random mod 100) < pct then
> 
>              selected = true
> 
>           else
> 
>              selected = false
> 
> 
>     The pseudocode shown above is an example of that approximation,
>     relying on a random number generator to effectively produce a whole
>     number between 0 and 99, inclusive.  If that number is less than the
>     value of the "pct" tag, then a message producing a DMARC "fail"
>     result will be subject to the DMARC policy in question; if not, it
>     will be subject to the lesser policy.


Correct.


>     Over time and given enough
>     iterations of the pseudocode, this should produce a roughly uniform
>     distribution of all values across the range, which we will refer to
>     going forward as "the pool".


This definition of "the pool" is not in the spec.  It is enough to 
have a decent random function.  Anyway...


>     However, mathematics teaches us that
>     the pool cannot be guaranteed to produce the desired result.
>     The sampling done to honor the "pct" tag is known in mathematics as a
>     Binomial Distribution, where a number of independent samples of the
>     pool are taken, with each one having the same probability of
>     producing a number that is less than the value of the "pct" tag.


Here you deviate from the spec.  A binomial distribution is 
characterized by a fixed number of samples n and a probability pct. 
The spec only talks about the probability pct.  Your argument seems to 
assume that a mail stream is made of 5 messages or less, which is 
completely nonsensical w.r.t. to common understanding of DMARC usage.

Indeed, most servers end up relaying more than five messages to any 
receiver within a given mail stream.

Now, I confess that I don't have a formal definition of mail stream at 
hand.  I tend to consider a stream, for example, the series of 
messages signed with a given DKIM selector, or emitted by a given 
relay.  Roughly, a mail stream is the entity that a reputation tracker 
attaches a value to, isn't it?


>     A Binomial Distribution is expressed by the following function, known
>     as a probability mass function (PMF):
> 
>                   n!         x          n-x
>       f(x) = ----------- *  p  * (1 - p)
>              (n-x)! * x!
> 
>     In English, the PMF is a way to calculate the probability that x
>     items from a sample of n items will have the desired result when p is
>     the probability that any one item will have the desired result.


The spec does not say that pct=20 requires a receiver to not applying 
the policy in one message out of the first five ones.  My 
understanding is that it says that /on average/ a receiver will not 
apply the policy in the 20% of the failed messages of the whole mail 
stream.


>     For example, for a DMARC policy record with pct=20, we let p = 0.2,
>     and to calculate the probability that 1 out of every 5 messages will
>     be assigned the requested policy, we have:
> 
>             5!           1            5-1
>        ----------- *  0.2  * (1 - 0.2)     =
>        (5-1)! * 1!
> 
> 
>           120       1     4
>          ----- * 0.2 * 0.8  = 5 * 0.2 * 0.4096 = 0.4096
>           24
> 
>          0.4096 * 100 = 40.96%
> 
>     The above demonstrates that for every five messages producing a DMARC
>     "fail" result, there is a slightly less than 41% chance that just one
>     of the five will have the requested policy applied to it.  The table
>     below shows the percent probability for all possible results:
> 
>          -----------------------------------
>          | X  | Percent chance that X of 5 |
>          |    |  will have policy applied  | >          -----------------------------------
>          | 0  |        32.768%             |
>          -----------------------------------
>          | 1  |        40.96%              |
>          -----------------------------------
>          | 2  |        20.48%              |
>          -----------------------------------
>          | 3  |         5.12%              |
>          -----------------------------------
>          | 4  |         0.64%              |
>          -----------------------------------
>          | 5  |         < 0.1%             |
>          -----------------------------------


Whatever the definition of stream, the binomial distribution is not 
the right tool to compute what's happening with realistic numbers. 
Now it's about dinner time and I'm not willing to program a binary 
distribution for a decently high number of trials.  What would you 
reckon it is the probability that, out of 1000 failed messages, the 
percent chance probability of applying the policy to X message is such 
that 150 < X < 250?  That would be an inefficient way to compute the 
precision of Monte Carlo approximation.


> Your mileage may vary, of course, but the larger point here is that while
> you personally have objected to removal of the "pct" tag, there has
> seemingly been rough consensus supporting the idea of removing all values
> except 0 and 100, and quite a lot of agreement that having a tag named
> "pct" that only had valid values of 0 and 100 didn't make sense, so this
> rev is a first attempt to find a path to documenting something that does
> make sense, given that there is support for keeping the functionality that
> pct=0 provides.


I agree pct with only valid values of 0 and 100 makes no sense.


Best
Ale
--