Re: [dmarc-ietf] Some Proposed Language for a New pct Tag Defintion

Alessandro Vesely <vesely@tana.it> Mon, 02 August 2021 10:45 UTC

Return-Path: <vesely@tana.it>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5BFB3A17F7 for <dmarc@ietfa.amsl.com>; Mon, 2 Aug 2021 03:45:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1152-bit key) header.d=tana.it
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OoeizlLfaEhK for <dmarc@ietfa.amsl.com>; Mon, 2 Aug 2021 03:45:45 -0700 (PDT)
Received: from wmail.tana.it (wmail.tana.it [62.94.243.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFA023A17F6 for <dmarc@ietf.org>; Mon, 2 Aug 2021 03:45:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tana.it; s=delta; t=1627901140; bh=QF5JKfoGeBuPqYbG2wXdr7dj0Gst1llFX6HJENyUhT4=; l=7674; h=To:References:From:Date:In-Reply-To; b=CsqcsZkTH89SpAmZXD0FlXv7x4Gy76sCQIBe1/Sy7/PnnJ80ZLI9NFuFrCWNBtjsv 4VPy4AIurmMS5rEtq7vWpxY33YmJ9BSV/37sHswx9aJPuYSkCT5lOy1opf66KuU3RK NxmnQ+ssGE6vRIYoTl/3KqP/i4AHct0/kTzCyY7aUkd7ju6oW+dyNG1ivqwGK
Authentication-Results: tana.it; auth=pass (details omitted)
Original-From: Alessandro Vesely <vesely@tana.it>
Received: from [172.25.197.111] (pcale.tana [172.25.197.111]) (AUTH: CRAM-MD5 uXDGrn@SYT0/k, TLS: TLS1.3, 128bits, ECDHE_RSA_AES_128_GCM_SHA256) by wmail.tana.it with ESMTPSA id 00000000005DC042.000000006107CCD4.000042E4; Mon, 02 Aug 2021 12:45:40 +0200
To: dmarc@ietf.org
References: <CAHej_8m4W_k_r9SV6reNJA7aMGFCkK451tjvQGtrPNwRtJwC8A@mail.gmail.com> <6e96de62-f387-bb42-a5da-0b7f74674a02@tana.it> <CAH48ZfzjQxRzqpGD9GqgeJcJ25V1cA3ke-x-N-bxO9--Lm4NUQ@mail.gmail.com> <6a5ba0e4-7bc1-0dee-4bb1-4fa1678d5c70@tana.it> <CAH48ZfwOPeFyjVWs6C7A0DfJ5uYFHCYQBnij8QZrBVsQeg6Msw@mail.gmail.com>
From: Alessandro Vesely <vesely@tana.it>
Message-ID: <b6e83339-412d-830b-c7d9-a2e0e038428c@tana.it>
Date: Mon, 2 Aug 2021 12:45:40 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <CAH48ZfwOPeFyjVWs6C7A0DfJ5uYFHCYQBnij8QZrBVsQeg6Msw@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/AJQTgi1ffKntEbbbTK9NN4QEWRk>
Subject: Re: [dmarc-ietf] Some Proposed Language for a New pct Tag Defintion
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Aug 2021 10:45:51 -0000

On Sun 01/Aug/2021 20:56:55 +0200 Douglas Foster wrote:
> Ale, I tried to explain my objections in the original post.   However, it is a 
> very important question, so I am happy to revise and extend my points.    
> Forgive me for being long-winded , I am trying to be thorough because I see 
> problems at many levels.


You're adding useless complications.


> Random Guessing can increase the volume of wrong decisions.
> 
> The basic math does not work.   Assume that a message sequence has a 
> probability P of being unwanted, and a probability of Q = 1-P of being 
> wanted.   Does it make sense to use a random number based on P to discard messages?
> 
> Probability of outcomes:
> 
> ·P*P – unwanted messages, correctly blocked
> ·P*Q – unwanted messages, incorrectly accepted
> ·Q*P – wanted messages, incorrectly blocked
> ·Q*Q – wanted messages correctly accepted
> 
> Total error rate is 2*P*Q. We have exchanged a one-sided error (allowing P 
> unwanted messages) for a two-sided error distribution?    Does it improve the 
> overall error rate.   Specifically, when is 2*P*Q < P ?
> 
> Cancelling P from both sides (P>0) yields 2*Q < 1 and Q < 0.5
> 
> If the message stream is more than 50% unwanted, then random guessing might 
> produce fewer total errors than allow-all.   If the message stream at least 50% 
> wanted, then random guessing produces inferior results.


In plain English, if you have more spam than ham, then blocking at random is 
correct in most cases.  That's an obvious statement which adds nothing to the 
discussion.


> Other filtering stages will raise Q and lower P
> 
> Since the specific issue is failed DMARC Authentication, we also need to 
> consider how this task fits into the evaluation process.    I believe my 
> process is typical:
> 
> ·First, messages from known-bad senders are blocked.
> ·Second, sender authentication is performed, at which point some messages may 
>  be discarded.
> ·Third, content filtering is applied, and suspicious content is blocked.
> ·Fourth, end-user activity occurs, where some messages are ignored or discarded.
> 
> One effect of the first stage is that it lowers P and raises Q.   During sender 
> authentication, Q is likely to be above 50% even if the initial mail stream has 
> a Q below 50%.


The purpose of authentication is to recognize senders by name rather than by IP 
number.  Thus, according to the sense of "known-bad senders", authentication 
can be considered a prerequisite of the first stage.  If authentication fails, 
you don't know who the sender is, therefore you don't know if it's good or bad.


> If a false negative occurs during sender authentication, causing an unwanted 
> message to be allowed, the message may be blocked during content filtering or 
> it may be ignored by the user.  Consequently, if the probability P is 
> applicable during sender authentication, the probability of a threat being 
> successful is less than P.


No.  A successful authentication of a spammy message is not a false negative. 
The fact that a message is unwanted has nothing to do with DMARC.


> Random guessing will increase the volume of unrecoverable errors.
> 
> If a false positive occurs during sender authentication, causing a wanted 
> message to be blocked, there is no opportunity for recovery.


Actually, an opportunity of recovery exists.  The sender can have feedback 
mechanisms, such as 5yz SMTP replies, delivery notifications, return receipts 
or other web-based actions.  It can use feedback to recover from authentication 
errors.  Such errors happen in an apparent random fashion too.  For example, 
when the word "from" followed by a space appears in the beginning of a line, 
some agent insert a greater-than sign ('>') before it, thereby breaking a DKIM 
signature.  As soon as the sender recognizes that delivery failed, it can 
repeat sending the same message several times until, by chance, it gets a toss 
greater than its pct.  Phishers, OTOH, are known for not retrying.

The above is a use case for pct!= 0 and pct!=100.


> Therefore, false positives are a greater problem than false negatives, and
> the random guessing algorithm has the effect of replacing false negatives
> with false positives.

Replacing what...?


> Sender’s probability has no relation to Evaluator’s probability
> 
> For any single domain, incoming messages can be broken into three categories:
> 
> ·Legitimately-sourced messages which arrive with valid credentials.
> ·Legitimately-sourced messages which arrive with failed credentials.
> ·Impersonation messages which arrive with failed credentials.
> 
> For simplicity, assume that sender and receiver interests are aligned – the 
> receiver wants to accept all legitimately-sourced messages from the domain.   
> Since the sender is moving toward P=REJECT and the recipient wants to enforce 
> P=REJECT, we will also assume that mailing lists are not part of the mail stream.
> 
> Neither sender nor receiver know the volume of unwanted impersonating 
> messages.   This means that the denominator is unknown, but would be determined 
> by the volume of impersonation + legitimate messages.   The numerator for 
> computing wanted message rates (Q) is all of the legitimate messages.  The 
> numerator for computing unwanted message rates (P) is all of the impersonation 
> messages.
> 
> Because the recipient wants all of the legitimately-source messages, the 
> percentage of legitimate messages sent with imperfect credentials is irrelevant.


It is not irrelevant if the receiver rejects on DMARC fail.


> Assuming that the source domain knows the volume of messages which are sent 
> without complete credentials, and publishes a percentage based on that 
> knowledge.    Can the evaluator benefit from that information?   I don’t think so.


Certainly not.  In the use case outlined above, the published pct can be a hint 
to the sender for the number of retries before resorting to something else.


> Credentials at origin are determined by whether the source is configured to 
> apply correct SPF and DKIM credentials or not.   The source domain could 
> determine message volumes by server to compute a weighted statistic for 
> percentage of messages with correct credentials.    But any single evaluator 
> will need see the same weighted distribution of message sources.   It may not 
> receive any messages from non-compliant servers, it may receive messages only 
> from non-compliant servers, or any other possibly weight distribution.   
> Applying the source-domain’s percentage estimate to the received message stream 
> would only make sense if the weighting is comparable.


I don't think pct can be somehow calculated based on the percentage of failed 
authentications.  Even if one has the same percentage of failed authentication 
for every receiver, it still would make no sense to set pct at that value. 
What would a sender obtain?


> More importantly, the assumed goal for both sender and receiver is to have all 
> legitimately-sourced messages to be accepted.  Arbitrarily blocking some wanted 
> messages, for the sake of notifying about credentialling problems, works 
> against the goal of the evaluator and his user base.  It is too high a price to 
> pay.


It is still a percentage of the price you pay with pct=100.


> On Sun, Aug 1, 2021 at 5:13 AM Alessandro Vesely <vesely@tana.it> wrote:


I snip the original message.  Interested readers store the whole thread and it 
is in dmarc-ietf's archive anyway.


Best
Ale
--