[dmarc-ietf] nits in draft-ietf-dmarc-aggregate-reporting-02

Martin Kealey <martin@kurahaupo.gen.nz> Fri, 07 May 2021 05:24 UTC

Return-Path: <martin@kurahaupo.gen.nz>
X-Original-To: dmarc@ietfa.amsl.com
Delivered-To: dmarc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8EE0A3A19A7 for <dmarc@ietfa.amsl.com>; Thu, 6 May 2021 22:24:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cwVQE01mh4eV for <dmarc@ietfa.amsl.com>; Thu, 6 May 2021 22:24:23 -0700 (PDT)
Received: from gromit.sig.net.nz (smtp-out.sig.net.nz [202.27.199.35]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4B0C03A19A3 for <dmarc@ietf.org>; Thu, 6 May 2021 22:24:21 -0700 (PDT)
Received: from mail-ot1-f41.google.com ([209.85.210.41]) by gromit.sig.net.nz ([202.27.199.35]:25) with esmtpa (Exim 4.72 #1) id 1lesyB-0003BJ-2s for dmarc@ietf.org; Fri, 07 May 2021 17:24:11 +1200
Received: by mail-ot1-f41.google.com with SMTP id c28-20020a9d615c0000b02902dde7c8833eso2023505otk.7 for <dmarc@ietf.org>; Thu, 06 May 2021 22:24:10 -0700 (PDT)
X-Gm-Message-State: AOAM531er4C4uyjGG0htjRC3gQqfDvr6HOxNBhtrsbzyK5FMa7KjcH2W vid1GbS5p49hAWQaVKFYi4hA5TWtmm3gglWHMu0=
X-Google-Smtp-Source: ABdhPJxuBn3pUPUuguzhElILAapGt1UXvpx/jnPpO3EZcarVP2c4jK5fYNQdngEPxVfTG3Gdyxjxn2ZDuo3vLe7if8M=
X-Received: by 2002:a9d:6016:: with SMTP id h22mr6558574otj.158.1620365042534; Thu, 06 May 2021 22:24:02 -0700 (PDT)
MIME-Version: 1.0
From: Martin Kealey <martin@kurahaupo.gen.nz>
Date: Fri, 07 May 2021 15:23:43 +1000
X-Gmail-Original-Message-ID: <CAN_U6MUNfaaUPQ=u3NTkoSFSuMMxZ_2wKeUkid_vVu0sYZ+3LQ@mail.gmail.com>
Message-ID: <CAN_U6MUNfaaUPQ=u3NTkoSFSuMMxZ_2wKeUkid_vVu0sYZ+3LQ@mail.gmail.com>
To: dmarc@ietf.org
Content-Type: multipart/alternative; boundary="000000000000c44d7905c1b6a2b0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dmarc/WarQERhbF0y3PiHiFs8hjpRh5Hs>
X-Mailman-Approved-At: Fri, 07 May 2021 03:38:36 -0700
Subject: [dmarc-ietf] nits in draft-ietf-dmarc-aggregate-reporting-02
X-BeenThere: dmarc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Domain-based Message Authentication, Reporting, and Compliance \(DMARC\)" <dmarc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dmarc>, <mailto:dmarc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dmarc/>
List-Post: <mailto:dmarc@ietf.org>
List-Help: <mailto:dmarc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dmarc>, <mailto:dmarc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 May 2021 05:45:00 -0000

I'm not quite sure how I'm supposed to submit nitpickery like this, so if
there's a better forum please let me know.

1. Filename & content-type

Section 2.6.1 among other things says that the name for the mime-part
containing the report MUST end with ".xml" or ".xml.gz", yet the example
given ends with neither of those (it ends with just ".gz").

The main use for this is as a unique report identifier; its use as a
filename is entirely secondary and only relevant to manual processing by a
human, so MUST seems quite excessive.

It seems like there are separate drivers for each part of the filename
suffix, and perhaps they should be two independent SHOULD requirements. If
we want to facilitate its use as a filename, perhaps we should just say
that the filename SHOULD be universally unique and MUST NOT contain "/" or
start with ".".

It seems strange to vary the content-type based solely on what amounts to a
transport optimization, namely gzip; this smells of working around
deficiencies in other standards. (From the perspective of an application
using email as a transport, it would seem to make more sense to allow
"content-transfer-encoding" to be a chain such as "base64+gzip", or
alternatively, for "content-type" to accept the addition of a "gzip/"
prefix, forming "gzip/text/xml". However I digress, as that's a discussion
for an entirely different standards track.)

According to rfc 7303 §9.2 <https://tools.ietf.org/html/rfc7303#section-9.2>,
the "text/xml" content-type is merely an alias for "application/xml". Other
standards such as related documents by w3c
<https://www.w3.org/2006/02/son-of-3023/draft-murata-kohn-lilley-xml-04.html#textxml>
go further in actively declaring it deprecated.

It seems to me that rfc7303 §4.2
<https://tools.ietf.org/html/rfc7303#section-4.2> and rfc6838 §4.2.5
<https://tools.ietf.org/html/rfc6838#section-4.2.5> taken together indicate
that registration of a content type such as *application/dmarc-feedback+xml*
would be appropriate.

2. Size limit

I'm concerned that specifying the maximum report size *after* compression
is possibly focussing on the wrong costs, and distorts the conceptual model:

   1. It implies that the *compressed* file is the relevant artefact being
   transported, which leads to the weirdness with filenames and content-types
   mentioned above.
   2. The size of the report is trivial compared with the size of the
   messages it's reporting on, both in terms of storage and bandwidth, and
   gzip decompression is very cheap, so compression makes negligible
   difference to those costs.
   3. The cost of processing the received report to incorporate it into the
   bulk reporting correlates more closely with its "uncompressed" size. In
   particular, the memory footprint of the receiver process is likely to be
   correlated with this limit, especially if its first step is to build an
   in-memory DOM from the XML. (I would be surprised if any real
   report-accepting system *didn't* work this way.)


3. Scheduling

Concern about processing load also brings me to section 2.4.2, which
essentially directs everyone to send their reports simultaneously. Since
the receiver needs to be able handle reports with any reporting period, it
seems likely that having most but not all reports arriving at the same time
would be the worst outcome, needing both (a) complex coding to cope with
asynchronous reporting, and (b) having to cope with high load spikes (or
suffer delays with the reports spooled for batch processing).

It also imposes a load spike on the report generators, to generate all
their reports at once (or spool and delay), but at least they can derive
some benefit from not having overlapping reporting periods.

In the scheme of things this isn't a huge load compared with the actual
processing of email, it seems like it would be preferable to allow the
receiver to specify their preference in this regard, at least to choose
between "UTC synchronized" and "randomized". Or for this document to
specify "randomized" as the default.

-Martin