Re: [dmarc-ietf] Clarification about data integrity within Aggregate Reports (Ticket #40)

Alessandro Vesely <vesely@tana.it> Tue, 05 January 2021 11:20 UTC

Return-Path: <vesely@tana.it>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 298173A040B for <dmarc@ietfa.amsl.com>; Tue, 5 Jan 2021 03:20:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.382
X-Spam-Level:
X-Spam-Status: No, score=-2.382 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.262, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1152-bit key) header.d=tana.it
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PfjinwcxsU4f for <dmarc@ietfa.amsl.com>; Tue, 5 Jan 2021 03:20:42 -0800 (PST)
Received: from wmail.tana.it (wmail.tana.it [62.94.243.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A34173A03F8 for <dmarc@ietf.org>; Tue, 5 Jan 2021 03:20:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tana.it; s=delta; t=1609845639; bh=hK5EEX7hDdTwWN54BzdfVj99mHhrvyjc0Y875rur82s=; l=4955; h=To:References:From:Date:In-Reply-To; b=BS1p2cc73xa4+SPtqyu/r0f+bmqydywTwsbcQXpPFbWvek9jFx1grOxj/Uh4JyARm jQsdJ0nGECpyOBGjjARVZmPx/U1vok1f40yPvCRyD4+TwNYkgA1nEmDNKL86OvBbQx sWwjeiXJd8E9uZJI1HGLrRbgP7sv+JtAFm3izxQJUuePJLu8z4DZ3auZ0NsLh
Authentication-Results: tana.it; auth=pass (details omitted)
Original-From: Alessandro Vesely <vesely@tana.it>
Received: from [172.25.197.111] (pcale.tana [172.25.197.111]) (AUTH: CRAM-MD5 uXDGrn@SYT0/k, TLS: TLS1.3, 128bits, ECDHE_RSA_AES_128_GCM_SHA256) by wmail.tana.it with ESMTPSA id 00000000005DC053.000000005FF44B87.0000277F; Tue, 05 Jan 2021 12:20:38 +0100
To: "Brotman, Alex" <Alex_Brotman=40comcast.com@dmarc.ietf.org>, "dmarc@ietf.org" <dmarc@ietf.org>
References: <MN2PR11MB435151665586B5A40D101103F7D70@MN2PR11MB4351.namprd11.prod.outlook.com> <59d0e09e-9296-c16e-94b1-8f344faf88d2@tana.it> <MN2PR11MB4351CB31FD6C315422559BDCF7D20@MN2PR11MB4351.namprd11.prod.outlook.com>
From: Alessandro Vesely <vesely@tana.it>
Message-ID: <9c362df2-8338-75f6-d8c2-256124842cda@tana.it>
Date: Tue, 05 Jan 2021 12:20:37 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0
MIME-Version: 1.0
In-Reply-To: <MN2PR11MB4351CB31FD6C315422559BDCF7D20@MN2PR11MB4351.namprd11.prod.outlook.com>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/-omxO8A8CuDeDBpqFuV4DKgZdgU>
Subject: Re: [dmarc-ietf] Clarification about data integrity within Aggregate Reports (Ticket #40)
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Jan 2021 11:20:46 -0000

On Mon 04/Jan/2021 16:53:20 +0100 Brotman, Alex wrote:
> -----Original Message----- From: dmarc <dmarc-bounces@ietf.org> On Behalf Of Alessandro Vesely
>> On Wed 30/Dec/2020 23:18:35 +0100 Brotman, Alex wrote:
>>>
>>> There's an open ticket (https://trac.ietf.org/trac/dmarc/ticket/40) noting
>>> that we should clarify what constitutes valid data in a report. For 
>>> example, the report cannot state that DMARC-DKIM was a "pass" when DKIM
>>> itself was a failure. >>>
>>> It seems like the gist is that within the report it should never happen that
>>> DKIM or SPF are noted to have passed in the context of DMARC if they have
>>> not passed on their own.  [...]
>>>
>>> Does that seem properly summarized?
>>
>> If the aggregate report content, Section 2.2, was well explained, the above
>> text would be redundant.  The point is that Section 2.2 looks like a high level
>> list of features.  It is completely useless for implementing a report producer,
>> let alone a consumer.  We have to rewrite that section, possibly trying to re-
>> use the same wording and the same order of appearance of concepts, so as
>> to minimize readers' confusion, but strictly matching the content of Appendix
>> A (was Appendix C).
> 
> I don't think I disagree here, but I want to be clear on what you're requesting.  You'd like to see a more verbose description of the goal of the aggregate report, as well as the contents?
> 
> "... Each report MUST contain data for only one 5322 domain.  The values reported MUST be as evaluated from the original message, not from the local policy overrides ..." and so on?


Yes, more or less.

Section 2.2 is good until "The format for these reports is defined in Appendix 
C."  (Ok, that should be Appendix A)  Following that, it should describe what 
the content actually must be.  I'd start, for example, like so:

     Aggregate reports (feedback) consist of two parts, a header and a set of
     records.  The header holds the report metadata (report_metadata) as well as
     the policy discovered for the given domain, subject of the report
     (policy_published).  Each report MUST contain data for only one domain.
     That policy MUST correspond to the DMARC record that the subject domain
     published during the reporting period.  If the subject domain modified its
     DMARC record during the period, a report generator MAY create multiple
     reports for the same domain, each for the periods during which the record
     didn't vary.  Otherwise, a generator SHOULD report the last policy it
     found.

     The set of records (record)...

I think we can skip elements whose content is obvious, but say something about 
various cases.  I can write more text after we find the right tone.  Is the 
above too lengthy?


>> The consistency checks above can be useful for building verification tools.
>>
>> Let me take this occasion to recall that there are XML syntax check tools that
>> can be used to automatically verify the syntax based on the schema.  We
>> should write a more compliant XML in order to use them.
> 
> I've written an RNC (Relax NG Compact) previously.  As we get closer to finalizing any format changes, we can create that, and then use jing/pyjing to validate XML reports.


jing is a nice utility.  I had used svalidate and xmlstarlet, which provide 
similar validation.


>  You'd want to use this to enforce report contents as specified above?


Absolutely.  Let me note that, to pass tests using the utilities mentioned 
above, reports should start something like so:

     <?xml version="1.0" encoding="UTF-8"?>
     <dmarc:feedback xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
         xmlns:dmarc="http://dmarc.org/dmarc-xml/0.1"
         xs:schemaLocation="http://dmarc.org/dmarc-xml/0.1 rua.xsd">
         <report_metadata>
         ...

(replace schemaLocation as appropriate)

The above form is currently not used.  Note that the "version" element becomes 
irrelevant when a schema is given.

It is easy, as a developer upgrades her report producer, to add the correct 
boilerplate.  To wit, it's more difficult to add selectors or to switch from 
.zip to .gz.


>  Not that I'm opposed, though, it can't enforce that the sender is using the proper values for DKIM pass/fail, only that the value is one of "pass"/"fail" (at least for RNC).  I don't believe there's a way to create interrelated dependencies where we could say that "DMARC can't possibly pass if both SPF and DKIM fail, and the report should be bogus if that's the case".


Right.  On the other hand, it makes little sense to check consistency if the 
syntax is bad.

We could create a web-based DMARC checking tool.  You get there and request a 
check, authenticating yourself and providing a few email addresses for the 
test.  The tool will then send you various messages from various IPs/ domains 
and then check the aggregate reports you produce.  That would include the 
consistency checks of ticket #40.


Best
Ale
--