Re: [dmarc-ietf] Doing a tree walk rather than PSL lookup

ned+dmarc@mrochek.com Wed, 25 November 2020 02:48 UTC

Return-Path: <ned+dmarc@mrochek.com>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 13D8B3A0DD5 for <dmarc@ietfa.amsl.com>; Tue, 24 Nov 2020 18:48:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=mrochek.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5tj0MxnK9bmx for <dmarc@ietfa.amsl.com>; Tue, 24 Nov 2020 18:48:45 -0800 (PST)
Received: from plum.mrochek.com (plum.mrochek.com [172.95.64.195]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5DF583A0DD4 for <dmarc@ietf.org>; Tue, 24 Nov 2020 18:48:45 -0800 (PST)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01RSEEL45X7K00D6TO@mauve.mrochek.com> for dmarc@ietf.org; Tue, 24 Nov 2020 18:45:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mrochek.com; s=201712; t=1606272302; bh=heiy1l7U/EBDbCe10XyV2iEOqOYdmZKYaR7mSsD1GsI=; h=From:Cc:Date:Subject:In-reply-to:References:To:From; b=N0et3/GUEUNmLBiG1NqWm6hxbaNW9vHoeVTBUNa/WANmrh2iQOhNH7v5wZ2I/f+PS Y66KQDYYvfLZJXlFJF1kD8Gc/uEu/Qg1sdKHn0BwzfFm82i518IlAgo2vDkRdK4WwQ TY7Fe8xEvqYvX0x6q9edZfAbzJUE6oz7ZLxROHT0=
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: TEXT/PLAIN; CHARSET="us-ascii"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01RSDRSDRS8G0055SH@mauve.mrochek.com> (original mail from NED@mauve.mrochek.com) for dmarc@ietf.org; Tue, 24 Nov 2020 18:44:59 -0800 (PST)
From: ned+dmarc@mrochek.com
Cc: dmarc@ietf.org, vesely@tana.it
Message-id: <01RSEEL1VCRS0055SH@mauve.mrochek.com>
Date: Tue, 24 Nov 2020 18:23:11 -0800
In-reply-to: "Your message dated Tue, 24 Nov 2020 12:03:51 -0500" <20201124170351.C430227DFEBF@ary.qy>
References: <efa0117e-5b17-800d-820d-b5d2413c6075@tana.it> <20201124170351.C430227DFEBF@ary.qy>
To: John Levine <johnl@taugh.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/20-CaR9cUs37vk-4cvWSLULRw_U>
Subject: Re: [dmarc-ietf] Doing a tree walk rather than PSL lookup
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 Nov 2020 02:48:47 -0000

> In article <efa0117e-5b17-800d-820d-b5d2413c6075@tana.it> you write:
> >> One of the points of the tree walk is to get rid of the PSL processing.
> >

> > The PSL processing is a local lookup on an in-memory suffix tree.  How is it a
> > progress to replace it with a tree walk?  A PSL search is lightning faster than
> > even a single DNS lookup, isn't it?

We added PSL support for other reasons some time back.

FWIW, there were some intricacies I didn't like in the suffix tree approach.
And since memory use really isn't a consideration for something this small, I
opted for multiple lookups in a hash table instead. We have our own hash table
code and loadable format, but there are compile-to-loadable-format tools out
there if you don't want to roll your own.

> You have to download a copy of the PSL,

Hopefully you're doing that in the background...

> read it into your program, and
> parse it into some internal form. The PSL is over 200K of text and
> 13,000 lines, so while it's not a large file, it's not zero either.

Our implementation processes a bunch of text files from various sources, of
which the PSL is less than 5% of the total size, into our loadable format.

I just checked, and it takes a litle over 2 CPU seconds to run the program. On
an 333Mhz Alpha.

I really don't think this is a concern.

> If you're lucky you can amortize your PSL parsing across multiple
> DMARC checks, but your DNS cache amortizes DNS lookups across mutiple
> checke, too.

Parse only happens when one of the input files changes. Why would you
implement it any other way? 

> The DNS approach has the advantage that you don't have to depend on a
> third party's text file updated at unknown intervals, and also makes
> it easier to deal with what I've called the Holy Roman Empire problem.

So you download and check every week or whatever. Still not seeing a problem.

Finally, there's the HRE issue. Which I guess could be a problem. But so can
DNS usage, which is subject to what I'll now call the fiefdom problem, where
you're required to use a DNS service operated by someone else in your org and
they are ... problematic.

Of course DNS use is unavoidable. Even so, the fiefdom problem has led to
requests to minimize DNS usage by any means possible.

The HRE problem remains theoretical for us at least.

				Ned