Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted

Scott Kitterman <ietf-dkim@kitterman.com> Wed, 15 February 2023 13:39 UTC

Return-Path: <ietf-dkim@kitterman.com>
X-Original-To: ietf-dkim@ietfa.amsl.com
Delivered-To: ietf-dkim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5BF71C17D667 for <ietf-dkim@ietfa.amsl.com>; Wed, 15 Feb 2023 05:39:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.d=kitterman.com header.b="p8FPvcOL"; dkim=pass (2048-bit key) header.d=kitterman.com header.b="DEWREJoI"
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6y_oFdrZRoCy for <ietf-dkim@ietfa.amsl.com>; Wed, 15 Feb 2023 05:39:21 -0800 (PST)
Received: from interserver.kitterman.com (interserver.kitterman.com [IPv6:2604:a00:6:1039:225:90ff:feaa:b169]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 35FFCC169527 for <ietf-dkim@ietf.org>; Wed, 15 Feb 2023 05:39:21 -0800 (PST)
Received: from interserver.kitterman.com (unknown [IPv6:2604:a00:6:1039:225:90ff:feaa:b169]) by interserver.kitterman.com (Postfix) with ESMTPS id E297BF802AE; Wed, 15 Feb 2023 08:39:08 -0500 (EST)
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/simple; d=kitterman.com; i=@kitterman.com; q=dns/txt; s=201903e; t=1676468334; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type : from; bh=OjdmhQA/4+YNFBTOUWzxKzAHIhTc2PivfZ8ZFLLAzeI=; b=p8FPvcOL4VLQ9wW6FEhK7KOMINQjlQfiJpld+Nq15rhJqF53ySQSjaKXwUTCPeW0coD5B +M/BIEzYrvSYmldAw==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kitterman.com; i=@kitterman.com; q=dns/txt; s=201903r; t=1676468333; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type : from; bh=OjdmhQA/4+YNFBTOUWzxKzAHIhTc2PivfZ8ZFLLAzeI=; b=DEWREJoI1z0xSiuq8jWlkJMNfH7QAj8Dl12h6XWWFz9GEAUD68M0EcGV2zZih8f/CMpKU Vc7PfXpaNl+cR1abhI3dQ0mVGytOJno54IpaV1Gc5rPAH/LQzIYYFQjM8XCeuAe8itPPfjt vvPv4LiC4xLmDZ1oQjP0mJ3Uc+i9pi6d4kn/MpEqdn1f80gxQFkxOKgbtoHf6SMTF6T7wZy +lve5B4Beny32Tx2S3y29+7RouNn/R8yOY+CQVeodKKLXgzri4EX8aP+NgtBpQ6YUKdcR6V eNDno10o/PRRNyQ15E90StdeK2ROgL0RFSqYTkdhgax146k45WXETObaRxxg==
Received: from localhost.localnet (static-72-81-252-22.bltmmd.fios.verizon.net [72.81.252.22]) by interserver.kitterman.com (Postfix) with ESMTP id DDE8CF8007A; Wed, 15 Feb 2023 08:38:53 -0500 (EST)
From: Scott Kitterman <ietf-dkim@kitterman.com>
To: ietf-dkim@ietf.org
Cc: Alessandro Vesely <vesely@tana.it>
Date: Wed, 15 Feb 2023 08:38:47 -0500
Message-ID: <4889354.vXGf7xteCD@localhost>
In-Reply-To: <ee7398b2-aa9a-6a48-d746-d80bec804fd0@tana.it>
References: <CAAFsWK3B7OfcRFwayzM=nZ1TuHoK93vFSTfBd73mGEvq1Ti+fg@mail.gmail.com> <3176719.QUIsX6EK59@localhost> <ee7398b2-aa9a-6a48-d746-d80bec804fd0@tana.it>
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-dkim/GrPCsSfTu44QZ20GqIIdtIlZ4pw>
Subject: Re: [Ietf-dkim] DKIM Replay Problem Statement and Scenarios -01 draft posted
X-BeenThere: ietf-dkim@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: IETF DKIM List <ietf-dkim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-dkim>, <mailto:ietf-dkim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-dkim/>
List-Post: <mailto:ietf-dkim@ietf.org>
List-Help: <mailto:ietf-dkim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-dkim>, <mailto:ietf-dkim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 Feb 2023 13:39:25 -0000

On Wednesday, February 15, 2023 5:23:34 AM EST Alessandro Vesely wrote:
> On Tue 14/Feb/2023 23:42:36 +0100 Scott Kitterman wrote:
> > On Tuesday, February 14, 2023 4:16:00 PM EST Evan Burke wrote:
> >> On Tue, Feb 14, 2023 at 11:44 AM Michael Thomas <mike@mtcc.com> wrote:
> >>> On Tue, Feb 14, 2023 at 11:18 AM Michael Thomas <mike@mtcc.com> wrote:
> >>>> Have you considered something like rate limiting on the receiver side
> >>>> for
> >>>> things with duplicate msg-id's? Aka, a tar pit, iirc?
> >> 
> >> I believe Yahoo does currently use some sort of count-based approach to
> >> detect replay, though I'm not clear on the details.
> >> 
> >>>> As I recall that technique is sometimes not suggested because (a) we
> >>>> can't
> >>>> come up with good advice about how long you need to cache message IDs
> >>>> to
> >>>> watch for duplicates, and (b) the longer that cache needs to live, the
> >>>> larger of a resource burden the technique imposes, and small operators
> >>>> might not be able to do it well.
> >>> 
> >>> At maximum, isn't it just the x= value? It seems to me that if you don't
> >>> specify an x= value, or it's essentially infinite, they are saying they
> >>> don't care about "replays". Which is fine in most cases and you can just
> >>> ignore it. Something that really throttles down x= should be a tractable
> >>> problem, right?
> 
> The ration between duplicate count and x= is the spamming speed.
> 
> >>> But even at scale it seems like a pretty small database in comparison to
> >>> the overall volume. It's would be easy for a receiver to just prune it
> >>> after a day or so, say.
> >> 
> >> I think count-based approaches can be made even simpler than that, in
> >> fact.
> >> I'm halfway inclined to submit a draft using that approach, as time
> >> permits.> 
> > I suppose if the thresholds are high enough, it won't hit much in the way
> > of legitimate mail (as an example, I anticipate this message will hit at
> > least hundreds of mail boxes at Gmail, but not millions), but of course
> > letting the first X through isn't ideal.
> 
> Scott's message hit my server exactly once.  Counting is a no-op for small
> operators.
> 
> > If I had access to a database of numerically scored IP reputation values
> > (I
> > don't currently, but I have in the past, so I can imagine this at least),
> > I
> > think I'd be more inclined to look at the reputation of the domain as a
> > whole (something like average score of messages from an SPF validated
> > Mail From, DKIM validated d=, or DMARC pass domain) and the reputation of
> > the IP for a message from that domain and then if there was sufficient
> > statistical confidence that the reputation of the IP was "bad" compared
> > to the domain's reputation I would infer it was likely being replayed and
> > ignore the signature.
> Some random forwarder in Nebraska can be easily mistaken for a spammer that
> way.  Reputation is affected by email volume.  Even large operators have
> little knowledge of almost silent MTAs.
> 
> Having senders' signatures transmit the perceived risk of an author would
> contribute an additional evaluation factor here.  Rather than discard
> validated signatures, have an indication to weight them.  (In that respect,
> let me note the usage of ARC as a sort of second class DKIM, when the
> signer knows nothing about the author.)

Any reputation based solution does have down scale limits.  Small mail sources 
(such as your random Nebraska forwarder) generally will have no reputation 
vice a negative one and so wouldn't get penalized in a scheme like the one I 
suggested.  This does, however, highlight where the performance challenge is.  
We've moved it from duplicate detection to rapid assessment of reputation for 
hosts that have sudden volume increases.

I think that's fine as that's not at all a problem that's unique to this 
challenge and ultimately, I think if replay attacks end up more complicated 
because instead of blasting 1,000,000 messages from one host they have to 
trickle 1.000 messages from 1,000 hosts it's a win.

I don't think this is a problem that's going to have a singular mechanical 
solution to that makes it go away.  This is substantially about making this 
particular technique less effective so maybe they move on to something else or 
at least less bad stuff gets delivered.

> > I think that approaches the same effect as a "too many dupes" approach
> > without the threshold problem.  It does require reputation data, but I
> > assume any entity of a non-trivial size either has access to their own or
> > can buy it from someone else.
> 
> DNSWLs exist.

I'm not sure how that's relevant.  Please expand on this if you think it's 
important.

Scott K