[dmarc-ietf] DMARC aggregate reports XML Schema inconsistencies

"Freddie Leeman" <freddie@leemankuiper.nl> Wed, 31 July 2019 09:47 UTC

Return-Path: <freddie@leemankuiper.nl>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6C9DA12003F for <dmarc@ietfa.amsl.com>; Wed, 31 Jul 2019 02:47:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=leemankuiper.nl
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VTvc6EjaFvwu for <dmarc@ietfa.amsl.com>; Wed, 31 Jul 2019 02:47:32 -0700 (PDT)
Received: from srv01.leeman-automatisering.nl (srv01.leeman-automatisering.nl [87.239.9.190]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DBC9E120033 for <dmarc@ietf.org>; Wed, 31 Jul 2019 02:47:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=leemankuiper.nl; s=mta1; h=Content-Type:MIME-Version:Message-ID:Date: Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=gjyMH81qITkCkFO0D9PIPVHPEqdhH9jq1GYGkyig1x8=; b=HoVA5fgDMn5Ngw3cdsEwG3risg JOGdoZ2ncVbf+Xs6xJH/edsQP6L5SjQwTG8LDCgytFTf99asSUF/KGB3t3AOSfWqcFMezqrWzxnOf GArgOPqYmLXEp7a3/IqcRnApt7x9Zi3ZQe/BVeoNjhNF0W1ZKQDYlWvT115BY3HbFF5VKQbq9Bfw0 ph/F7rY5KC/yfZuf/mb6N1DWhtqEo/p/Bv0GGyQPpVTD7NDeCMDObqiWdtOQVc7Oml7bql7ChTmI4 d3AlGbtIeG2weHi4tmdrMolrn3hXRTO6fPLm5/uGpGQNml2h4ZVw2GQMfZKgtq6qpWIdigOWcQm5d eeXVFzXA==;
Received: from 83-83-140-171.cable.dynamic.v4.ziggo.nl ([83.83.140.171] helo=LAPC01) by srv01.leeman-automatisering.nl with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92) (envelope-from <freddie@leemankuiper.nl>) id 1hslCj-0006jw-B0 for dmarc@ietf.org; Wed, 31 Jul 2019 11:47:29 +0200
From: Freddie Leeman <freddie@leemankuiper.nl>
To: dmarc@ietf.org
Date: Wed, 31 Jul 2019 11:47:29 +0200
Message-ID: <008401d54784$f8300750$e89015f0$@leemankuiper.nl>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0085_01D54795.BBB99AA0"
X-Mailer: Microsoft Outlook 15.0
Thread-Index: AdVHd+Tnb/OcO6SJQ6ewR7XrEBdXOw==
Content-Language: nl
X-Antivirus-Scanner: Clean mail though you should still use an Antivirus
X-Authenticated-Id: info@leemankuiper.nl
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/kZI0bytLU9uaMmh4yQB_FTJSKX4>
Subject: [dmarc-ietf] DMARC aggregate reports XML Schema inconsistencies
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 31 Jul 2019 09:47:36 -0000

I've been processing millions of DMARC aggregate reports from a lot of
different organizations, and have been trying to make sense of them for
quite some time now. I've noticed that most of them, even those from large
parties like Google and Yahoo!, fail to follow the DMARC RFC guidelines
(Appendix C.  DMARC XML Schema). I've written a blog about this that can be
found here: https://www.uriports.com/blog/dmarc-reports-ietf-rfc-compliance/

 

The bottom line is that the RFC 7489 Appendix C is a mess and contradicts
itself numerous times in both schema and comments. I think it's important to
be clearer and stricter about the xml elements and their values. Too much of
this section is open to interpretation. 

 

Some examples: 

 

The report has an element with the name "policy_published". This name would
indicate that the elements within, contain the domain's published policy.
The comments however, mention "applied" and "apply". Most organizations that
send aggregate reports do not send failure reports and thus do not "apply"
the "fo" (Failure reporting options) element. This is why parties like
Google leave this element out of their reports. This particular element's
comment ("failure reporting options in effect") also implies that it is
optional. On the other hand, this element has a default "minOccurs" value of
1, so it should not be omitted.

 

It should also be clearer about what to do with policy elements that are
unspecified in the domain's DNS record. I think it is best to fill these
elements in the report with their respected default values. So when 'pct' is
not specified in the domain's policy, the report should state '100'. When
'sp' is not specified it should have the value of the 'p' element.

 

I've also noticed that most parties do not specify the PolicyOverrideType,
even when both SPF and DKIM alignment fails. So this element should be made
mandatory whenever alignment fails and the disposition doesn't follow the
domain's DMARC policy.

 

The RFC guidelines for aggregate reports should also state that empty
elements with a minOccurs of 0 should be omitted and not be left blank.

 

It should also be specified that if a message is not signed with DKIM the
'DKIMAuthResultType' should be omitted. And thus the 'DKIMResultType' 'none'
would never be used. Because when a message has no signatures, then it also
doesn't have a specified 'domain' (d=) (minOccurs 1) and 'selector' (s=)
(minOccurs 0). What happens now is that some organizations report non-signed
messages with the 'dkim' element and fill the 'domain' and 'selector' with a
bogus 'none' value.

 

There are also multiple mentions of MinOccurs="1", even though the document
specifies that unless otherwise specified in the schema, the minOccurs and
maxOccurs values for each element are set to 1. This adds to the confusion.

 

DMARC reporting capabilities are a valuable aspect of the DMARC mechanism.
It can help domain owners in setting up and hardening their DKIM/SPF/DMARC
policy. But unless these reports follow strict guidelines they just pile up
to a lot of inconsistent data open to interpretation and guesswork. Domain
owners should be able to understand the data without the need for a
spiritual voodoo DMARC guru (trademark pending) to make sense of it all.

 

Kind regards,

Freddie Leeman