Re: [dmarc-ietf] Versioning and XML namespaces in aggregate reports (#33, #70)

Alessandro Vesely <vesely@tana.it> Fri, 14 May 2021 18:12 UTC

Return-Path: <vesely@tana.it>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 49AFC3A3B6C for <dmarc@ietfa.amsl.com>; Fri, 14 May 2021 11:12:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1152-bit key) header.d=tana.it
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id l9QJ-amKfQEx for <dmarc@ietfa.amsl.com>; Fri, 14 May 2021 11:12:49 -0700 (PDT)
Received: from wmail.tana.it (wmail.tana.it [62.94.243.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 249743A3B68 for <dmarc@ietf.org>; Fri, 14 May 2021 11:12:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tana.it; s=delta; t=1621015963; bh=eIn7ICGzKnU+nWsiHGa+9ufJjsbTZSqcOYc1tfg6/A4=; l=3467; h=To:References:From:Date:In-Reply-To; b=AsnfA54tjk6lDE2o/udVgkseEViQV5TUvLPK8cg02WlTJ2cGx+1sY/vh0nnhExu1Z qiMc90jGDjr9bI8g0BQsxSVxFbyMNEg8H/JA7+hpk0C/Pgt88F8Y3rK2WC/e6mOQNl fmf2xmYFNP9iVJS9b/km7LI07SSeT+x7kDO+tczvquf4N3ajltvmeVQv5RhWM
Authentication-Results: tana.it; auth=pass (details omitted)
Original-From: Alessandro Vesely <vesely@tana.it>
Received: from [172.25.197.111] (pcale.tana [172.25.197.111]) (AUTH: CRAM-MD5 uXDGrn@SYT0/k, TLS: TLS1.3, 128bits, ECDHE_RSA_AES_128_GCM_SHA256) by wmail.tana.it with ESMTPSA id 00000000005DC0CD.00000000609EBD9B.00003765; Fri, 14 May 2021 20:12:43 +0200
To: dmarc@ietf.org
References: <bc3c25c0-2ec9-2e39-1dd6-1cc08521d03b@wander.science> <2bfed96b-9247-2af1-809c-4f8065ebf64c@gmail.com> <c49bc771-a95e-b78b-dc11-db0cb06ad688@tana.it> <44bdbe41-a43a-f6a4-9788-faaf67db6636@wander.science> <MN2PR11MB4351F2DDE26E28B76AF18AE4F7509@MN2PR11MB4351.namprd11.prod.outlook.com>
From: Alessandro Vesely <vesely@tana.it>
Message-ID: <0689dcb4-07e3-4f04-b5ef-04eede9cbc57@tana.it>
Date: Fri, 14 May 2021 20:12:42 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.9.0
MIME-Version: 1.0
In-Reply-To: <MN2PR11MB4351F2DDE26E28B76AF18AE4F7509@MN2PR11MB4351.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/EBTJ65XAkIztVY7MlE7hEI5kgKA>
Subject: Re: [dmarc-ietf] Versioning and XML namespaces in aggregate reports (#33, #70)
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 May 2021 18:12:54 -0000

On Fri 14/May/2021 15:42:56 +0200 Brotman, Alex wrote:
> There are a few tickets that may break report ingestion systems due to structure and/or value changes.  Should we decide that's an implementation issue, or that we truly can't change the format of the reports?  I'm sure most ingestion systems are rather flexible given the number of reports that appear to not match what 7489 states/suggests.


Report consumers use XML libraries to recover the value of named fields.  We 
can safely add fields.  Renaming fields or change existing semantics would 
break backward compatibility, which I think we can avoid.


> If we are going to allow changes to the structure, and there is some concern about which version the receiver supports (or prefers?), should we put a flag into the DMARC record?  And of course, that may dependent on the receiver, if multiple are listed, so that would have to belong to each individual receiving address.


Overkill IMHO.


>> From: dmarc <dmarc-bounces@ietf.org> On Behalf Of Matthäus Wander
>>
>> Regarding the existing top-level <version> below <feedback>: Even if
>> parsers don't require the version to function, it remains useful for 
>> measuring the adoption of the different DMARC specifications (as
>> requested in #70). In fact, one implementation I looked at (parsedmarc)
>> uses it for only this purpose. A missing <version> is logged as "draft" 
>> schema version.

In my tiny MX I have a cache of 631 aggregate reports received recently.  121 
reports from 31 unique org_names have a /feedback/version element, 510 from 37 
organizations don't.  The latter group includes google.com, Yahoo! Inc., 
Verizon Media, Mail.Ru, ...

Perhaps, someone with larger mail flows can bring better statistics.


>> Regarding the XML namespace declaration:
>> The XML schema serves not only as specification for developers, but can be
>> also used for automatic syntax checks of reports -- provided that the
>> namespace declaration is fixed. XSD validation is an immensely useful tool for
>> testing the output of report generators. It helped me to discover two nasty
>> bugs in an implementation, which appeared in 2 out of ~10k reports and
>> would have gone unnoticed otherwise.


Very much agreed.  Validating the report before sending is very safe.  Also 
building online aggregate report checking utilities would benefit from this 
possibility.

Does the IETF provide URLs for hosting XSDs?


>> A version number within the schema is not necessary for this use case.


Or we can stick to a static <version>1.0</version>, similar to v=DMARC1, 
MIME-Version, and the like, if useful.


>> A different matter is whether automatic XSD validation on the report
>> consumer side is a supported use case. There is some value in it: two lines of
>> code suffice to perform input validation. However, the validation is strict and
>> does not allow for being liberal in what you accept (might be handy for
>> protocol police, though). Achieving upward compatibility is not trivial,
>> because there is no general "ignore all unknown elements" statement in
>> XSD. It is possible to define a <xs:any> placeholder in the schema, but this
>> element must be inserted explicitly into each place where extensibility is
>> desired. This would require careful foresight in the schema design.


Designing an abstract extension for ARC is going to be particularly challenging.


Best
Ale
--