[dmarc-ietf] Tree walk max depth concern and impact on reporting for domain owners working as expected

Seth Blank <seth@valimail.com> Wed, 05 April 2023 22:54 UTC

Return-Path: <seth@valimail.com>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 60F37C15C291 for <dmarc@ietfa.amsl.com>; Wed, 5 Apr 2023 15:54:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.086
X-Spam-Level:
X-Spam-Status: No, score=-2.086 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=valimail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KeLIz9IEyCs8 for <dmarc@ietfa.amsl.com>; Wed, 5 Apr 2023 15:54:29 -0700 (PDT)
Received: from mail-lf1-x136.google.com (mail-lf1-x136.google.com [IPv6:2a00:1450:4864:20::136]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 40843C151546 for <dmarc@ietf.org>; Wed, 5 Apr 2023 15:54:29 -0700 (PDT)
Received: by mail-lf1-x136.google.com with SMTP id y15so48644065lfa.7 for <dmarc@ietf.org>; Wed, 05 Apr 2023 15:54:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=valimail.com; s=google2048; t=1680735267; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=6yU5OKgPx6WogzQv8RlXGN93iCbGJ9CPeGbvpal+ed4=; b=S16BHXRFm2toCFIgXPR8DBMWJWEzwHH8X8plKDmv9x+4WA8DYUAtyDd7RFZntGoQLJ f/BqX2K+BrGX5VD2Jr0FkY1VQlxvfTQDPHijXAXWn5Iins3+OoumWAuJe29pAtzyfdkn 9Bl5DQh5mM24UdseOuLmPZYF+4pdF/3/xU8g7yoKlpW45ql5c7bZ8+uLmXDS+bdnpKXF L06VE8FQNrcP83O3zOupTtdcU70IOVvmhfH4aVrR1gnZK6jJCg8LUAqwi8wLeS10PmC4 /r0aipiAkZOLKe7yE3uZmMR+MEL0DalMGjAoZBacW3iaCvj9pHtHlEnpezNTYVsgvAdo G2gQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680735267; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=6yU5OKgPx6WogzQv8RlXGN93iCbGJ9CPeGbvpal+ed4=; b=cUVVuTA8L/1mDWiNX8vV1chtO/KTC/gj3Y3IpyK9iEKEz68gsfujf8PLNkNB05jhb+ 0cPLfUHFAduBhokwraC6iveWQXtvU4hT6dVjd52TCaRmb16g6UdIMP5VtkCfzNQ4Pn7I fxQKWjJHHGQZzq/rRZf5l0nEzxJRn67s9uQdKDGfIMVVAm9pco1DT0T0ue7B279vKXfx quTSFSGGnub6NeNCyLvPFPCpr85oHLQX8Xhr7+CGXzVig9ngmuQmYNfdqPteO39XCYN4 JoTe9HSBoG6dSTrVlJw2UAabbc5NeO6/XlhydtgetM0jXVwBOTjQ2yOGtkFQdpyU337J 9L8w==
X-Gm-Message-State: AAQBX9dyqFPj/OP12cLpQv8Xr5WHqkmGs2QJ7Zo39eSQ5fE+qPybsgDB n3i5hkTc7sj8ybnj53Ji9sqW0AD7rwRjmk8160b030qztsJzfjbXMj8=
X-Google-Smtp-Source: AKy350Y6FxQKvrV958fpy3p2gNHnkhhzkkMxaNQfy+JBYTCyrWKQUr6NDMytFG19XVLRu00eGkCrKJbhnsZb3Dw73QE=
X-Received: by 2002:a19:7417:0:b0:4eb:1606:48d5 with SMTP id v23-20020a197417000000b004eb160648d5mr2421428lfe.7.1680735266641; Wed, 05 Apr 2023 15:54:26 -0700 (PDT)
MIME-Version: 1.0
From: Seth Blank <seth@valimail.com>
Date: Wed, 05 Apr 2023 15:54:15 -0700
Message-ID: <CAOZAAfN9AN5gYTsB420o9K0a9iWT_j3Uz=icU1x86fe4q8DuxQ@mail.gmail.com>
To: IETF DMARC WG <dmarc@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000087b36e05f89eacdb"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/GoExCeJYWhxnvH8lwjbr7nAcFh4>
Subject: [dmarc-ietf] Tree walk max depth concern and impact on reporting for domain owners working as expected
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Apr 2023 22:54:34 -0000

I believe there’s a critical use case we missed with the tree walk,
specifically around policy and reporting discovery, not determining
organizational domain alignment.

One of the reasons we discussed a tree walk for DMARC bis in the first
case, was a specific problem with larger more complicated organizations for
whom policy discovery in 7489 does not work as needed. These organizations
include governments, universities, and healthcare organizations, and they
have shared some details both on this list and at M3AAWG.

Specifically, with complex organizations with sub-organizations with
separate reporting and policy needs, the sub-organization policy/reporting
was being skipped by the policy lookup, and the reports wound up at places
unable to act on or properly route them to the right place.

In the most extreme cases (US federal government), we see the following
paradigm with some frequency:

bounce.sender-subdomain.division.agency.department.gov with a 5322.From
generally of division.agency.department.gov or agency.department.gov.
Sometimes the full bounce domain is in the From (especially when things are
first set up but not yet configured well using dmarc, hence the need for
appropriate reporting!)-- this is rare, but essential to get reporting
right when it happens.

The reporting and policy here that are important are around the division or
agency, and rarely the department. In the case where we have a long PSD,
this reporting and policy would be skipped by the current proposed
algorithm with an N=5. e.g. with
sender.agency.department.example.gov.ccTLD, the current discovery mechanism
would skip the agency policy and reporting (which is what is wanted) and
instead land on the department one (which is not).

When we first proposed the tree walk, we thought we’d just walk up 5
labels, and then stop if nothing had been found. This handles this complex
organization use case cleanly for legitimate mail, which was the initial
intent. However, it a) does not handle the abuse use case well (what policy
do you choose if you exhaust the lookups without a policy answer?), and b)
the group rightly pointed out that this misses the use case of determining
organizational domain alignment, which is essential for dmarc overall. The
current jump in the algorithm handles both (a) and (b) effectively.

John Levine suggested I was twisting myself in knots trying to solve both
these use cases, when the simplest solution was to leave the algorithm
exactly as it is, and just revisit N.

So how do we handle this? What’s the worst case? Looking at the above
example, the longest “complex org” would be 5 labels long. I think we’ve
already agreed, backed by data from the PSL, that the longest PSD would be
4 labels long.

This seems to say that revisiting an N of (max complex labels, + max psd
labels, + 1) = 10 would cover even the most complex use cases without
needing to change the normative part of the document. Maybe there’s a
better N at 8 or 9. We should discuss. Below, I’ve proposed some
explanatory text and updated examples if we do want to revisit. I’ve used
N=10 as a placeholder, so if we end up at a lesser N, we only need to
remove examples, not generate more.

To be clear, due to the current policy discovery mechanics (check author
domain then jump to organizational domain), I'm not aware of any of these
complex orgs setting dmarc policies on Author Domains at such a depth. i.e.
N=5 today would not break anything currently in place. However, the tree
walk now enables these complex orgs to set policy much deeper in their
hierarchy, which would then potentially not work as expected and possibly
send reports to the wrong destination due to the current N=5.

I don't feel strongly about N=10, but I do feel strongly that N=5 is
insufficient. My gut feel is that 6 or 7 is likely more than enough to
cover all real world examples, but it's a gut feel only and not backed by
data.

Have at it!

Seth, as an individual

---

I propose that the existing text in rev -27 be slightly modified. Current
text is shown here, and current text with proposed modifications in bold
italics is shown on subsequent pages:

OLD

##  DNS Tree Walk {#dns-tree-walk}

The DMARC protocol defines a method for communicating information through
the publishing of records in DNS. Both the content of the records and their
location in the DNS hierarchy are used for two purposes: policy discovery
(see Section 4.7
<https://datatracker.ietf.org/doc/html/draft-ietf-dmarc-dmarcbis-27#dmarc-policy-discovery>)
and Organizational Domain determination (see Section 4.8
<https://datatracker.ietf.org/doc/html/draft-ietf-dmarc-dmarcbis-27#organizational-domain-discovery>
).

The relevant DMARC record for these purposes is not necessarily the DMARC
policy record found in DNS at the same level as the name label for the
domain in question. Instead, some domains will inherit their DMARC policy
records from parent domains one level or more above them in the DNS
hierarchy. Similarly, the Organizational Domain may be found at a higher
level in the DNS hierarchy.

These records are discovered through the technique described here, known
colloquially as the "DNS Tree Walk". The target of any DNS Tree Walk is a
valid DMARC policy record, but the rules defining required content for that
record depend on the reason for performing the Tree Walk.

To prevent possible abuse of the DNS, a shortcut is built into the process
so that domains that have more than five labels do not result in more than
five DNS queries.

The generic steps for a DNS Tree Walk are as follows:


   1.

   Query the DNS for a DMARC TXT record at the appropriate starting point
   for the Tree Walk. A possibly empty set of records is returned.
   2.

   Records that do not start with a "v=" tag that identifies the current
   version of DMARC are discarded. If multiple DMARC records are returned,
   they are all discarded. If a single record remains and it contains a
   "psd=n" tag, stop.
   3.

   Determine the target for additional queries (if needed; see the
note in Section
   4.8
   <https://datatracker.ietf.org/doc/html/draft-ietf-dmarc-dmarcbis-27#organizational-domain-discovery>),
   using steps 4 through 8 below.
   4.

   Break the subject DNS domain name into a set of ordered labels. Assign
   the count of labels to "x", and number the labels from right to left; e.g.,
   for "a.mail.example.com", "x" would be assigned the value 4, "com" would
   be label 1, "example" would be label 2, "mail" would be label 3, and so
   forth.
   5.

   If x < 5, remove the left-most (highest-numbered) label from the subject
   domain. If x >= 5, remove the left-most (highest-numbered) labels from the
   subject domain until 4 labels remain. The resulting DNS domain name is the
   new target for the next lookup.
   6.

   Query the DNS for a DMARC TXT record at the DNS domain name matching
   this new target. A possibly empty set of records is returned.
   7.

   Records that do not start with a "v=" tag that identifies the current
   version of DMARC are discarded. If multiple DMARC records are returned for
   a single target, they are all discarded. If a single record remains and it
   contains a "psd=n" or "psd=y" tag, stop.
   8.

   Determine the target for additional queries by removing a single label
   from the target domain as described in step 5 and repeating steps 6 and 7
   until the process stops or there are no more labels remaining.


To illustrate, for a message with the arbitrary RFC5322.From domain of "
a.b.c.d.e.mail.example.com", a full DNS Tree Walk would require the
following five queries to locate the policy or Organizational Domain:


   -

   _dmarc.a.b.c.d.e.mail.example.com
   -

   _dmarc.e.mail.example.com
   -

   _dmarc.mail.example.com
   -

   _dmarc.example.com
   -

   _dmarc.com



NEW

##  DNS Tree Walk {#dns-tree-walk}

The DMARC protocol defines a method for communicating information through
the publishing of records in DNS. Both the content of the records and their
location in the DNS hierarchy are used for two purposes: policy discovery
(see Section 4.7
<https://datatracker.ietf.org/doc/html/draft-ietf-dmarc-dmarcbis-27#dmarc-policy-discovery>)
and Organizational Domain determination (see Section 4.8
<https://datatracker.ietf.org/doc/html/draft-ietf-dmarc-dmarcbis-27#organizational-domain-discovery>
).

The relevant DMARC record for these purposes is not necessarily the DMARC
policy record found in DNS at the same level as the name label for the
domain in question. Instead, some domains will inherit their DMARC policy
records from parent domains one level or more above them in the DNS
hierarchy. Similarly, the Organizational Domain may be found at a higher
level in the DNS hierarchy.

These records are discovered through the technique described here, known
colloquially as the "DNS Tree Walk". The target of any DNS Tree Walk is a
valid DMARC policy record, but the rules defining required content for that
record depend on the reason for performing the Tree Walk.

The Tree Walk described here is designed with two goals in mind. First, it
had to ensure that it could discover DMARC policies that might be published
many levels deep in the DNS hierarchy by both simple and complex
organizations. Examples of complex organizations include governments,
educational institutions, and healthcare organizations, which tend to use
longer than average RFC5322.From domains (e.g.,
sub-org.org.division.agency.department.gov) and which distribute DMARC
policy management rather than maintaining central control. Second, it had
to ensure unambiguous answers for organizational domain alignment and
policy discovery regardless of the number of labels in the author domain,
without opening up a DNS lookup abuse vector.

To prevent possible abuse of the DNS, To meet both of these goals, the tree
walk is designed to handle domains that have up to ten labels, and a
shortcut is built into the process so that domains that have more than five
ten labels do not result in more than five ten DNS queries.

The generic steps for a DNS Tree Walk are as follows:


   1.

   Query the DNS for a DMARC TXT record at the appropriate starting point
   for the Tree Walk. A possibly empty set of records is returned.
   2.

   Records that do not start with a "v=" tag that identifies the current
   version of DMARC are discarded. If multiple DMARC records are returned,
   they are all discarded. If a single record remains and it contains a
   "psd=n" tag, stop.
   3.

   Determine the target for additional queries (if needed; see the
note in Section
   4.8
   <https://datatracker.ietf.org/doc/html/draft-ietf-dmarc-dmarcbis-27#organizational-domain-discovery>),
   using steps 4 through 8 below.
   4.

   Break the subject DNS domain name into a set of ordered labels. Assign
   the count of labels to "x", and number the labels from right to left; e.g.,
   for "a.mail.example.com", "x" would be assigned the value 4, "com" would
   be label 1, "example" would be label 2, "mail" would be label 3, and so
   forth.
   5.

   If x < 5 10, remove the left-most (highest-numbered) label from the
   subject domain. If x >= 5 10, remove the left-most (highest-numbered)
   labels from the subject domain until 4 9 labels remain. The resulting
   DNS domain name is the new target for the next lookup.
   6.

   Query the DNS for a DMARC TXT record at the DNS domain name matching
   this new target. A possibly empty set of records is returned.
   7.

   Records that do not start with a "v=" tag that identifies the current
   version of DMARC are discarded. If multiple DMARC records are returned for
   a single target, they are all discarded. If a single record remains and it
   contains a "psd=n" or "psd=y" tag, stop.
   8.

   Determine the target for additional queries by removing a single label
   from the target domain as described in step 5 and repeating steps 6 and 7
   until the process stops or there are no more labels remaining.


To illustrate, for a message with the arbitrary RFC5322.From domain of

"a.b.c.d.e.f.g.h.i.j.k.l.m.n.mail.example.com", a full DNS Tree Walk would
require the following

five ten queries, in order to locate the policy or Organizational Domain:


   -

   * _dmarc.a.b.c.d.e.f.g.h.i.j.k.l.m.n.mail.example.com
   -

   * _dmarc.i.j.k.l.m.n.mail.example.com
   -

   * _dmarc.j.k.l.m.n.mail.example.com
   -

   * _dmarc.k.l.m.n.mail.example.com
   -

   * _dmarc.l.m.n.mail.example.com
   -

   * _dmarc.m.n.mail.example.com
   -

   * _dmarc.n.mail.example.com
   -

   * _dmarc.e.mail.example.com
   -

   * _dmarc.mail.example.com
   -

   * _dmarc.example.com
   -

   * _dmarc.com


-- 

*Seth Blank * | Chief Technology Officer
*e:* seth@valimail.com
*p:* 415.273.8818

This email and all data transmitted with it contains confidential and/or
proprietary information intended solely for the use of individual(s)
authorized to receive it. If you are not an intended and authorized
recipient you are hereby notified of any use, disclosure, copying or
distribution of the information included in this transmission is prohibited
and may be unlawful. Please immediately notify the sender by replying to
this email and then delete it from your system.