Re: [dmarc-ietf] DMARC bis: ticket 69: add JSON reporting format?

Alessandro Vesely <vesely@tana.it> Thu, 21 May 2020 10:38 UTC

Return-Path: <vesely@tana.it>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F03903A0B88 for <dmarc@ietfa.amsl.com>; Thu, 21 May 2020 03:38:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.087
X-Spam-Level:
X-Spam-Status: No, score=-2.087 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, T_HTML_ATTACH=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1152-bit key) header.d=tana.it
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lemifSFA0blv for <dmarc@ietfa.amsl.com>; Thu, 21 May 2020 03:38:39 -0700 (PDT)
Received: from wmail.tana.it (wmail.tana.it [62.94.243.226]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5E1BB3A0B9F for <dmarc@ietf.org>; Thu, 21 May 2020 03:38:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tana.it; s=delta; t=1590057515; bh=Xs+Z8EWGuRtdbdf0UIDSDNfDaRgpELaOwYKt0pjG+Go=; l=10329; h=To:References:From:Date:In-Reply-To; b=AAx7Rn1ESgWtoQXr83IoGZT1gLF/eMdv+CUdq3xtizKbMNJGmUVWFBFi4+cw+RK6V L9Mj2y6WDB5EbOSm9q6rXvsYXn0x1oX+xiTDhKULAnfuLLHxuUHXwbxrIjI6pcWo1Q sbn3jVTRZYJT2D3nTnxcamjMxpB6TskLq/XNNsiVDB35M7giuG5w+dwz11o8q
Authentication-Results: tana.it; auth=pass (details omitted)
Received: from [172.25.197.111] (pcale.tana [172.25.197.111]) (AUTH: CRAM-MD5 uXDGrn@SYT0/k, TLS: TLS1.2, 128bits, ECDHE_RSA_AES_128_GCM_SHA256) by wmail.tana.it with ESMTPSA id 00000000005DC0BA.000000005EC65A2A.00002115; Thu, 21 May 2020 12:38:34 +0200
To: dmarc@ietf.org
References: <CAOZAAfPY=ZmN0oxMGNS0cSpYWu58hqLAW7wvvbh1442yU4E5Dg@mail.gmail.com> <5EBF720D.8040608@isdg.net> <44D5CBFA-85B6-4FEE-BBB3-B915D229984B@episteme.net> <5EC02769.9070508@isdg.net> <492288f0-3fc3-f85e-26c0-eda418de6b5c@tana.it> <c7470f8fcb9c024e840d4ae9d539433d@junc.eu> <CAOZAAfPqpXXFgned-d_3zqGChu3Yt-oyShLTjJhS9pCE8Aax_g@mail.gmail.com> <5EC47889.3030905@isdg.net> <8675CCF3-436F-4D0C-B286-AEAD0AF91BB7@kitterman.com> <5EC52749.20801@isdg.net> <b49eb629-fcf4-5cac-420c-f14ec65f24ec@tana.it> <5EC56014.70506@isdg.net> <3e9d3c49-c1c9-796f-64f3-fcc8f316a51c@tana.it> <5EC58C62.8090703@isdg.net>
From: Alessandro Vesely <vesely@tana.it>
Message-ID: <e9cb6cfd-5e8b-471b-8dd0-67177207a1fe@tana.it>
Date: Thu, 21 May 2020 12:38:34 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.8.0
MIME-Version: 1.0
In-Reply-To: <5EC58C62.8090703@isdg.net>
Content-Type: multipart/mixed; boundary="------------96A003BEBFB854D8B0E1193A"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/VGTEpCY7wGPGIL-l0JPZJkuocI0>
Subject: Re: [dmarc-ietf] DMARC bis: ticket 69: add JSON reporting format?
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 May 2020 10:38:42 -0000

On Wed 20/May/2020 22:00:34 +0200 Hector Santos wrote:
> On 5/20/2020 2:43 PM, Alessandro Vesely wrote:
> 
>> I mean, what is the CSV format of the following report, that I sent yesterday
>> for this list:
> 
> Sorry, if I ignored it.
> 
> Forgetting fact that you can your report easier to read for consumers, these
> would be an example of the CSV field headers.
> 
> CSV headers:
> 
> report_metadata.org_name, report_metadata.email, report_metadata.report_id,
> report_metadata.date_range.begin, report_metadata.date_range.end
> 
> Policy_Published.domain, Policy_Published.adkim, Policy_Published.aspf,
> Policy_Published.p
> 
> record.row, record.row.source_ip.record.count. record.row.policy_evaluated,
> record.row.policy_evaluated.disposition,
> record.row,policy_evaluated.disposition, record.row.policy_evaluated.dkim,
> record.row.policy_evaluated.spf
> 
> Note: You don't have to stick to redundant "name space" field names.


You didn't include auth_results.  That's tricky because you can have zero or
more dkim and spf results.  That would bring on a variable number of columns,
with no hint about which is which.

As Freddie noted, repeating the headers for every record may sound a little bit
wasteful, but gzip may come to rescue here too.

As for readability, aggregate reports are designed to be machine-readable to
the extreme detriment of any kind of human readability.  The best samples of
such attitude are the date_range elements.  HTML provides for a much better
human readability.  (Indeed, it is one of the formats you mentioned, as it is
supported by Google Docs).  I attach the HTML equivalent of the data I sent
yesterday.  Posters whose From: was rewritten can easily spot their row
—readability meaning exactly that.

Let me note that such htmlization is the result of a DMARC XSLT applied by a
mail filter after rua messages authentication but before delivery.  That way, a
readable format of the reports is ready in the mail folder whenever I'd care to
look at it, *as if it had been written in HTML in the first place*.  By similar
techniques one can obtain JSON, SQL, TXT, ….  In the face of such flexibility,
I'm puzzled by your asking for more.


>>>> ...  Can we get back to work, please?
>>>
>>> Sorry, but I consider a rude, disrespectful and ignorant statement, to be
>>> saying that.
>>
>>
>> No personal attack intended.  I'm being rude because I have the impression that
>> you are not defending a concrete, well defined need, but instead find new
>> arguments opportunistically to pursue a vague sense of format fashion.
> 
> That's a personal attack. If you don't understand the proposal, you should back
> off or ask for clarification.


I /am/ asking for clarifications.  Please excuse my tone if it hurts you.  I'll
try and keep calm, but please restrain from technically flaky arguments such as
CSV.  I think you're perfectly aware of the capabilities I'm exemplifying here.
 They rest on the fact that a report consumer knows in advance the format of
the data inside the received gzip.  Hence, that flexibility would be destroyed
by the introduction of multiple formats, as they would come randomly at the
mercy of the report generators, any prf= notwithstanding.  Not a great loss,
given your feeling about reporting?


>> You shifted from an asserted necessity of producers to a possible desire
>> of consumers.>
> I did no such thing. I won't repeat it, but it appears you didn't understand
> the proposal.


For sure, I don't understand the proposal.  Let's start over:

On Sat 16/May/2020 19:48:25 +0200 Hector Santos wrote:
> Just consider, when the spec has XML-only, then for those who use a solid
> JSON I/O system, they are now going to be required to add XML. So for them,
> its additional development complexity.  Everything they probably do JSON
> related. The exception would be DMARC using XML. This alone can delay or
> push aside DMARC Reporting implementation.


Does DMARC Reporting implementation mean generation or consumption?


On Wed 20/May/2020 02:23:37 +0200 Hector Santos wrote:
> I suggest that there is a new tag that provides a "Preferred Report Format"
> or "prf=" tag using registered acronymns for long time "standard" formats.
> For example:>
> prf=cvs,json,xml,afrf,iodef
>
> [...]
> 
> The verifier will do what can it offer. The publisher is providing a
> preference, that it may not get.


That's 100% uncertainty of the result, isn't it?  How is flexibility improved?
 I cannot store a received report as-is in Google Docs anyway, can I?


Best
Ale
--