Re: [dmarc-ietf] Doing a tree walk rather than PSL lookup

Alessandro Vesely <vesely@tana.it> Tue, 24 November 2020 19:24 UTC

Return-Path: <vesely@tana.it>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 407AE3A181C for <dmarc@ietfa.amsl.com>; Tue, 24 Nov 2020 11:24:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.121
X-Spam-Level:
X-Spam-Status: No, score=-2.121 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1152-bit key) header.d=tana.it
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3Fb2hr2utW_Y for <dmarc@ietfa.amsl.com>; Tue, 24 Nov 2020 11:24:27 -0800 (PST)
Received: from wmail.tana.it (wmail.tana.it [62.94.243.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9ABA43A181A for <dmarc@ietf.org>; Tue, 24 Nov 2020 11:24:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tana.it; s=delta; t=1606245863; bh=aD8z3lpeE4lLTHlPFFgAlfJ5yGmHfh2LjQfz6FS9reo=; l=1741; h=To:References:From:Date:In-Reply-To; b=DT06STzHcIqj4J1f1rizIOJwp/Hm9A8TLZRsR1bhWfEFRaP6x8gEXlM4Nhvcm66+W i1s7u1+0Xc64hiulmhQGiHmDCFSCdaXiNqeDcpTFjofnukomzB6/M/n0IFroNqzD++ +9MWgciOhUIXy+DAM9D3FyQj3R/6GevYR+G5DXd+ALb594LCSuoCYeMnMfKto
Authentication-Results: tana.it; auth=pass (details omitted)
Original-From: Alessandro Vesely <vesely@tana.it>
Received: from [172.25.197.111] (pcale.tana [172.25.197.111]) (AUTH: CRAM-MD5 uXDGrn@SYT0/k, TLS: TLS1.3, 128bits, ECDHE_RSA_AES_128_GCM_SHA256) by wmail.tana.it with ESMTPSA id 00000000005DC053.000000005FBD5DE7.00001244; Tue, 24 Nov 2020 20:24:23 +0100
To: John Levine <johnl@taugh.com>, dmarc@ietf.org
References: <20201124170351.C430227DFEBF@ary.qy>
From: Alessandro Vesely <vesely@tana.it>
Message-ID: <36f4f840-0911-56f5-185b-3f60166eab47@tana.it>
Date: Tue, 24 Nov 2020 20:24:23 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <20201124170351.C430227DFEBF@ary.qy>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/WKNc3i4Zw00zUEHItyzQ6TQFcw8>
Subject: Re: [dmarc-ietf] Doing a tree walk rather than PSL lookup
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Nov 2020 19:24:29 -0000

On Tue 24/Nov/2020 18:03:51 +0100 John Levine wrote:
> In article <efa0117e-5b17-800d-820d-b5d2413c6075@tana.it> you write:
>>> One of the points of the tree walk is to get rid of the PSL processing.
>>
>> The PSL processing is a local lookup on an in-memory suffix tree.  How is it a 
>> progress to replace it with a tree walk?  A PSL search is lightning faster than 
>> even a single DNS lookup, isn't it?
> 
> You have to download a copy of the PSL, read it into your program, and
> parse it into some internal form. The PSL is over 200K of text and
> 13,000 lines, so while it's not a large file, it's not zero either.


Right.  The optimal solution would be to load the list and the lookup algorithm 
as a shared object.  Currently, my filter has its private copy of it.  But then 
I don't reload the filter so often that parsing the file is noticeable.  To 
wit, loading the virus database takes much much longer.


> If you're lucky you can amortize your PSL parsing across multiple
> DMARC checks, but your DNS cache amortizes DNS lookups across multiple
> checks, too.


I doubt I'd get comparable efficiency, even if my mail server has a dedicated 
caching resolver.  Mail servers that rely on stub resolvers would experience a 
noticeable degradation.


> The DNS approach has the advantage that you don't have to depend on a
> third party's text file updated at unknown intervals,


Agreed.


> and also makes it easier to deal with what I've called the Holy Roman Empire
> problem.


Uh?  The Holy Roman Empire became a disconnected tree soon after Charlemagne's 
death, so that looks like some of the dystopic scenarios that ISOC conceived a 
few years ago.  Not sure what you mean.


Best
Ale
--